Postgresql源码(146)二进制文件格式分析

发布于:2025-06-06 ⋅ 阅读:(19) ⋅ 点赞:(0)

相关
Linux函数调用栈的实现原理(X86)

速查

# 查看elf头
readelf -h bin/postgres

# 查看Section
readelf -S bin/postgres
(gdb) info file
(gdb) maint info sections

# 查看代码段汇编
disassemble 0x48e980 , 0x48e9b0
disassemble main

# 查看代码段某个地址属于拿个函数
info line *0x7b7d90

# 执行视角查看segments
readelf -l bin/postgres

可执行文件格式

常见的可执行文件格式:

  • Windows:PE(Portable Executable)
  • Unix:ELF(Executable and Linkable Format)
  • MacOS IOS:Mach-O

postgres在linux平台编译后,生成可执行文件为ELF文件格式。

$ file bin/postgres

bin/postgres: ELF 64-bit LSB executable, 
x86-64, 
version 1 (SYSV), 
dynamically linked, 
interpreter /lib64/ld-linux-x86-64.so.2, 
for GNU/Linux 3.2.0, 
BuildID[sha1]=c7ab1c85b211f05bbc06a69566f82b05233782f5, 
with debug_info, 
not stripped, 
too many notes (256)

libpq.a 静态库

$ file lib/libpq.a

lib/libpq.a: current ar archive

libpq.so动态库

$ file lib/libpq.so.5.16

lib/libpq.so.5.16: ELF 64-bit LSB shared object, 
x86-64, version 1 (SYSV), 
dynamically linked, 
BuildID[sha1]=7bd87aa5ae3f13463c4ddd66f8d7f6cf1beab3fa, 
with debug_info, 
not stripped

ELF文件两种视角

  • 静态视角:Linking View
  • 执行视角:Execution View
    在这里插入图片描述

动态视角 vs 静态视角​:

  • ​静态视角​:由Section组成,描述链接时的代码/数据分区(如 .text、.rodata)。
  • 动态视角​:由Segment组成,描述运行时内存如何组织。一个Segment可能包含多个Section
  • Section组成的静态视图,Segment组成了动态视图。Segment实际运行时如何在进程虚拟地址空间内组织数据(Virtual Address Space)。

Segment在 ELF 文件中的意义​:

  • ELF 文件的 ​Program Header(程序头)​​ 中的 ​Segment(段)​​ 描述了程序加载到内存时的布局。每个 Segment 指定了以下信息:
  • 需要加载到进程 VAS 的哪些虚拟地址范围(如代码段 .text、数据段 .data)。
  • 访问权限(可读、可写、可执行)。
  • 文件偏移量和内存大小(p_offset、p_filesz、p_memsz)。

静态视角使用GDB分析ELF文件

postgres文件

$ readelf -h bin/postgres
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x48e980
  Start of program headers:          64 (bytes into file)
  Start of section headers:          41318232 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         38
  Section header string table index: 37
  • Magic字段可以宽度判断是否为ELF文件。45 4c 46 对应 E L F的ASCII码。
  • ELF类型:EXEC (Executable file)
  • 程序运行时将会执行的第一条指令的位置:0x48e980

gdb确认0x48e980地址再text段(所有程序代码都会在text段)

(gdb) info file
Symbols from "/data/mingjie/pgroot99/pghome/bin/postgres".
Local exec file:
	`/data/mingjie/pgroot99/pghome/bin/postgres', file type elf64-x86-64.
	Entry point: 0x48e980
	0x0000000000400238 - 0x0000000000400254 is .interp              # 动态链接器路径    
	0x0000000000400254 - 0x0000000000400274 is .note.ABI-tag        # 编译环境元数据
	0x0000000000400274 - 0x0000000000400298 is .note.gnu.build-id
	0x0000000000400298 - 0x0000000000414748 is .gnu.hash            # 动态符号表的哈希表,加速符号查找
	0x0000000000414748 - 0x0000000000454300 is .dynsym              # 动态链接符号表(函数/变量名)及其字符串表
	0x0000000000454300 - 0x0000000000485903 is .dynstr              # 动态链接符号表(函数/变量名)及其字符串表
	0x0000000000485904 - 0x000000000048adfe is .gnu.version
	0x000000000048ae00 - 0x000000000048afa0 is .gnu.version_r
	0x000000000048afa0 - 0x000000000048b138 is .rela.dyn
	0x000000000048b138 - 0x000000000048d2e0 is .rela.plt
	0x000000000048d2e0 - 0x000000000048d2fb is .init
	0x000000000048d300 - 0x000000000048e980 is .plt       # 动态跳转表(.plt)及全局偏移表(.got.plt),用于延迟绑定动态库函数
	0x000000000048e980 - 0x0000000000bf4e04 is .text      # 所有可执行代码​   <<<<<<< 0x48e980
	0x0000000000bf4e04 - 0x0000000000bf4e11 is .fini
	0x0000000000bf5000 - 0x0000000000e662e0 is .rodata    # 只读数据(字符串常量、全局常量等)
	0x0000000000e662e0 - 0x0000000000e95a5c is .eh_frame_hdr
	0x0000000000e95a60 - 0x0000000000f55668 is .eh_frame  # 异常处理信息
	0x0000000001155cd0 - 0x0000000001155cd8 is .init_array  # 构造函数指针列表
	0x0000000001155cd8 - 0x0000000001155ce0 is .fini_array  # 析构函数指针列表
	0x0000000001155ce0 - 0x0000000001155d68 is .data.rel.ro
	0x0000000001155d68 - 0x0000000001155fc8 is .dynamic
	0x0000000001155fc8 - 0x0000000001155fe8 is .got
	0x0000000001156000 - 0x0000000001156b50 is .got.plt
	0x0000000001156b60 - 0x000000000116e9b8 is .data      # 已初始化的全局变量/静态变量(非零值)
	0x000000000116e9c0 - 0x00000000011a4a60 is .bss       # 未初始化或零初始化的全局/静态变量(运行时自动清零)

maint也可以查询

(gdb) maint info sections
Exec file:
    `/data/mingjie/pgroot99/pghome/bin/postgres', file type elf64-x86-64.
 [0]      0x00400238->0x00400254 at 0x00000238: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS
 [1]      0x00400254->0x00400274 at 0x00000254: .note.ABI-tag ALLOC LOAD READONLY DATA HAS_CONTENTS
 [2]      0x00400274->0x00400298 at 0x00000274: .note.gnu.build-id ALLOC LOAD READONLY DATA HAS_CONTENTS
 [3]      0x00400298->0x00414748 at 0x00000298: .gnu.hash ALLOC LOAD READONLY DATA HAS_CONTENTS
 [4]      0x00414748->0x00454300 at 0x00014748: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS
 [5]      0x00454300->0x00485903 at 0x00054300: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS
 [6]      0x00485904->0x0048adfe at 0x00085904: .gnu.version ALLOC LOAD READONLY DATA HAS_CONTENTS
 [7]      0x0048ae00->0x0048afa0 at 0x0008ae00: .gnu.version_r ALLOC LOAD READONLY DATA HAS_CONTENTS
 [8]      0x0048afa0->0x0048b138 at 0x0008afa0: .rela.dyn ALLOC LOAD READONLY DATA HAS_CONTENTS
 [9]      0x0048b138->0x0048d2e0 at 0x0008b138: .rela.plt ALLOC LOAD READONLY DATA HAS_CONTENTS
 [10]     0x0048d2e0->0x0048d2fb at 0x0008d2e0: .init ALLOC LOAD READONLY CODE HAS_CONTENTS
 [11]     0x0048d300->0x0048e980 at 0x0008d300: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS
 [12]     0x0048e980->0x00bf4e04 at 0x0008e980: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
 [13]     0x00bf4e04->0x00bf4e11 at 0x007f4e04: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS
 [14]     0x00bf5000->0x00e662e0 at 0x007f5000: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
 [15]     0x00e662e0->0x00e95a5c at 0x00a662e0: .eh_frame_hdr ALLOC LOAD READONLY DATA HAS_CONTENTS
 [16]     0x00e95a60->0x00f55668 at 0x00a95a60: .eh_frame ALLOC LOAD READONLY DATA HAS_CONTENTS
 [17]     0x01155cd0->0x01155cd8 at 0x00b55cd0: .init_array ALLOC LOAD DATA HAS_CONTENTS
 [18]     0x01155cd8->0x01155ce0 at 0x00b55cd8: .fini_array ALLOC LOAD DATA HAS_CONTENTS
 [19]     0x01155ce0->0x01155d68 at 0x00b55ce0: .data.rel.ro ALLOC LOAD DATA HAS_CONTENTS
 [20]     0x01155d68->0x01155fc8 at 0x00b55d68: .dynamic ALLOC LOAD DATA HAS_CONTENTS
 [21]     0x01155fc8->0x01155fe8 at 0x00b55fc8: .got ALLOC LOAD DATA HAS_CONTENTS
 [22]     0x01156000->0x01156b50 at 0x00b56000: .got.plt ALLOC LOAD DATA HAS_CONTENTS
 [23]     0x01156b60->0x0116e9b8 at 0x00b56b60: .data ALLOC LOAD DATA HAS_CONTENTS
 [24]     0x0116e9c0->0x011a4a60 at 0x00b6e9b8: .bss ALLOC
 [25]     0x00000000->0x0000005a at 0x00b6e9b8: .comment READONLY HAS_CONTENTS
 [26]     0x015a4a60->0x015a8ef4 at 0x00b6ea14: .gnu.build.attributes READONLY HAS_CONTENTS
 [27]     0x00000000->0x00009770 at 0x00b72ea8: .debug_aranges READONLY HAS_CONTENTS
 [28]     0x00000000->0x011476f4 at 0x00b7c618: .debug_info READONLY HAS_CONTENTS
 [29]     0x00000000->0x000bd016 at 0x01cc3d0c: .debug_abbrev READONLY HAS_CONTENTS
 [30]     0x00000000->0x004fdf94 at 0x01d80d22: .debug_line READONLY HAS_CONTENTS
 [31]     0x00000000->0x00181834 at 0x0227ecb6: .debug_str READONLY HAS_CONTENTS
 [32]     0x00000000->0x0000b990 at 0x024004ea: .debug_ranges READONLY HAS_CONTENTS
 [33]     0x00000000->0x0022b286 at 0x0240be7a: .debug_macro READONLY HAS_CONTENTS

.text段

用x打印text段的地址,gdb会自动加上函数名,非常方便。

(gdb) x/32 0x48e980
0x48e980 <_start>:	0xfa1e0ff3	0x8949ed31	0x89485ed1	0xe48348e2
0x48e990 <_start+16>:	0x495450f0	0x4d80c0c7	0xc74800bf	0xbf4d10c1
0x48e9a0 <_start+32>:	0xc7c74800	0x007b7d7d	0x762a15ff	0x90f400cc
0x48e9b0 <_dl_relocate_static_pie>:	0xfa1e0ff3	0x0f2e66c3	0x0000841f	0x90000000
0x48e9c0 <deregister_tm_clones>:	0xf13d8d48	0x4800cdff	0xffea058d	0x394800cd
0x48e9d0 <deregister_tm_clones+16>:	0x481574f8	0x75ee058b	0x854800cc	0xff0974c0
0x48e9e0 <deregister_tm_clones+32>:	0x801f0fe0	0x00000000	0x801f0fc3	0x00000000
0x48e9f0 <register_tm_clones>:	0xc13d8d48	0x4800cdff	0xffba358d	0x294800cd

(gdb) x/32 main
0x7b7d7d <main>:	0xe5894855	0x20ec8348	0x48ec7d89	0xc6e07589
0x7b7d8d <main+16>:	0xc601ff45	0x9ba44905	0x8b480100	0x8b48e045
0x7b7d9d <main+32>:	0xc7894800	0x43770ce8	0x05894800	0x009e6c33
0x7b7dad <main+48>:	0x2c058b48	0x48009e6c	0xc8e8c789	0x48000002

_start的作用是调用函数入口main函数,main函数的入口地址是0x7b7d7d,_start是怎么调用进来的?用disassemble看下汇编:

(gdb) disassemble 0x48e980 , 0x48e9b0
Dump of assembler code from 0x48e980 to 0x48e9b0:
   0x000000000048e980 <_start+0>:	endbr64
   0x000000000048e984 <_start+4>:	xor    %ebp,%ebp
   0x000000000048e986 <_start+6>:	mov    %rdx,%r9
   0x000000000048e989 <_start+9>:	pop    %rsi
   0x000000000048e98a <_start+10>:	mov    %rsp,%rdx
   0x000000000048e98d <_start+13>:	and    $0xfffffffffffffff0,%rsp
   0x000000000048e991 <_start+17>:	push   %rax
   0x000000000048e992 <_start+18>:	push   %rsp
   0x000000000048e993 <_start+19>:	mov    $0xbf4d80,%r8
   0x000000000048e99a <_start+26>:	mov    $0xbf4d10,%rcx
   0x000000000048e9a1 <_start+33>:	mov    $0x7b7d7d,%rdi
   0x000000000048e9a8 <_start+40>:	callq  *0xcc762a(%rip)        # 0x1155fd8
   0x000000000048e9ae <_start+46>:	hlt
   0x000000000048e9af <.annobin_static_reloc.c_end+0>:	nop

mov $0x7b7d7d,%rdi将main地址存入rip,callq调用riq即完成main函数的调用。

如果想要插件某个函数的汇编代码,disassemble后面可以接地址也可以接函数名:

(gdb) disassemble main
Dump of assembler code for function main:
   0x00000000007b7d7d <+0>:	push   %rbp
   0x00000000007b7d7e <+1>:	mov    %rsp,%rbp
   0x00000000007b7d81 <+4>:	sub    $0x20,%rsp
   0x00000000007b7d85 <+8>:	mov    %edi,-0x14(%rbp)
   0x00000000007b7d88 <+11>:	mov    %rsi,-0x20(%rbp)
   0x00000000007b7d8c <+15>:	movb   $0x1,-0x1(%rbp)
   0x00000000007b7d90 <+19>:	movb   $0x1,0x9ba449(%rip)        # 0x11721e0 <reached_main>
   0x00000000007b7d97 <+26>:	mov    -0x20(%rbp),%rax
   0x00000000007b7d9b <+30>:	mov    (%rax),%rax
   0x00000000007b7d9e <+33>:	mov    %rax,%rdi
   0x00000000007b7da1 <+36>:	callq  0xbef4b2 <get_progname>
   0x00000000007b7da6 <+41>:	mov    %rax,0x9e6c33(%rip)        # 0x119e9e0 <progname>
   0x00000000007b7dad <+48>:	mov    0x9e6c2c(%rip),%rax        # 0x119e9e0 <progname>
   ...
   ...

拿到一个地址想知道对应哪个函数,起止地址是什么?例如上面main函数中的一行0x7b7d90

(gdb) info line *0x7b7d90
Line 64 of "main.c" starts at address 0x7b7d90 <main+19> and ends at 0x7b7d97 <main+26>.

.rodata

rodata段适用gdb打印不太方便,用objdump输出比较直观:

$ objdump -s bin/postgres --section=.rodata | more

bin/postgres:     file format elf64-x86-64

Contents of section .rodata:
 bf5000 01000200 00000000 00000000 00000000  ................
 bf5010 00000000 00000000 00000000 00000000  ................
 bf5020 2e2e2f2e 2e2f2e2e 2f2e2e2f 7372632f  ../../../../src/
 bf5030 696e636c 7564652f 73746f72 6167652f  include/storage/
 bf5040 6974656d 7074722e 68004974 656d506f  itemptr.h.ItemPo
 bf5050 696e7465 72497356 616c6964 28706f69  interIsValid(poi
 bf5060 6e746572 29000000 2e2e2f2e 2e2f2e2e  nter)...../../..
 bf5070 2f2e2e2f 7372632f 696e636c 7564652f  /../src/include/
 bf5080 73746f72 6167652f 6275666d 67722e68  storage/bufmgr.h
 bf5090 00627566 6e756d20 3c3d204e 42756666  .bufnum <= NBuff
 bf50a0 65727300 6275666e 756d203e 3d202d4e  ers.bufnum >= -N
 bf50b0 4c6f6342 75666665 72004275 66666572  LocBuffer.Buffer
 bf50c0 49735661 6c696428 62756666 65722900  IsValid(buffer).
 bf50d0 6272696e 2e630000 69647852 656c2d3e  brin.c..idxRel->
 bf50e0 72645f72 656c2d3e 72656c6b 696e6420  rd_rel->relkind
 bf50f0 3d3d2052 454c4b49 4e445f49 4e444558  == RELKIND_INDEX
 bf5100 20262620 69647852 656c2d3e 72645f72   && idxRel->rd_r
 bf5110 656c2d3e 72656c61 6d203d3d 20425249  el->relam == BRI
 bf5120 4e5f414d 5f4f4944 00000000 00000000  N_AM_OID........
 bf5130 72657175 65737420 666f7220 4252494e  request for BRIN
 bf5140 2072616e 67652073 756d6d61 72697a61   range summariza
 bf5150 74696f6e 20666f72 20696e64 65782022  tion for index "
 bf5160 25732220 70616765 20257520 77617320  %s" page %u was
 bf5170 6e6f7420 7265636f 72646564 00627269  not recorded.bri
 bf5180 6e696e73 65727420 63787400 746d7020  ninsert cxt.tmp
 bf5190 2b206c65 6e203d3d 20707472 00000000  + len == ptr....
 bf51a0 286b6579 2d3e736b 5f666c61 67732026  (key->sk_flags &
 bf51b0 20534b5f 49534e55 4c4c2920 7c7c2028   SK_ISNULL) || (
 bf51c0 6b65792d 3e736b5f 636f6c6c 6174696f  key->sk_collatio
 bf51d0 6e203d3d 20547570 6c654465 73634174  n == TupleDescAt
 bf51e0 74722862 64657363 2d3e6264 5f747570  tr(bdesc->bd_tup
 bf51f0 64657363 2c206b65 79617474 6e6f202d  desc, keyattno -
 bf5200 2031292d 3e617474 636f6c6c 6174696f   1)->attcollatio
 bf5210 6e29006e 6b657973 5b6b6579 6174746e  n).nkeys[keyattn
 bf5220 6f202d20 315d203d 3d203000 6e6e756c  o - 1] == 0.nnul
 bf5230 6c6b6579 735b6b65 79617474 6e6f202d  lkeys[keyattno -
 bf5240 20315d20 3d3d2030 00627269 6e676574   1] == 0.bringet
 bf5250 6269746d 61702063 78740000 00000000  bitmap cxt......
 bf5260 286e6b65 79735b61 74746e6f 202d2031  (nkeys[attno - 1
 bf5270 5d203e20 30292026 2620286e 6b657973  ] > 0) && (nkeys
...
...
...
...

地址从0xbf5000起始,和gdb查出来的也能对应上。

	0x0000000000bf5000 - 0x0000000000e662e0 is .rodata    # 只读数据(字符串常量、全局常量等)

执行视角分析ELF文件

$ readelf -l bin/postgres

Elf file type is EXEC (Executable file)
Entry point 0x48e980
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R      0x8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000b55668 0x0000000000b55668  R E    0x200000
  LOAD           0x0000000000b55cd0 0x0000000001155cd0 0x0000000001155cd0
                 0x0000000000018ce8 0x000000000004ed90  RW     0x200000
  DYNAMIC        0x0000000000b55d68 0x0000000001155d68 0x0000000001155d68
                 0x0000000000000260 0x0000000000000260  RW     0x8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000a662e0 0x0000000000e662e0 0x0000000000e662e0
                 0x000000000002f77c 0x000000000002f77c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000b55cd0 0x0000000001155cd0 0x0000000001155cd0
                 0x0000000000000330 0x0000000000000330  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .init_array .fini_array .data.rel.ro .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .init_array .fini_array .data.rel.ro .dynamic .got
  • Program Headers:每个Segment的情况。
  • Section to Segment mapping: Section和Segment的对应关系。

LOAD类型的Segment会在程序运行时被加载到VAS,而其余Segment主要用于辅助程序的正常运行。

第一个LOAD范围:0x0000000000000000 - 0x0000000000b55668
权限是RE对应

   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame

第二个LOAD范围:0x0000000000b55cd0 - 0x0000000000018ce8
权限是RW对应

   03     .init_array .fini_array .data.rel.ro .dynamic .got .got.plt .data .bss

在这里插入图片描述


网站公告

今日签到

点亮在社区的每一天
去签到