ThinkingInBinary

A collection of penetration testing, security and programming related thoughts. Or not.

View on GitHub

Symbolicating stripped ELF files manually

Let’s face the truth, debugging and pwning stripped ELFs is a tedious process. The lack of symbols means that we need to type a lot of addresses manually, which is error prone and a hassle. Wouldn’t it be nice if we could add some custom symbols on the binary? Researching the process didn’t yield the resources I expected, so here I am writing a guide.

Generating a test file

Our first step would be to generate a test file.

//example.c

int main() {
  return 0;
}

And compile it. (Without PIE)

$ gcc -o example example.c -no-pie

Now if we examine the ELF’s sections, we should see .symtab, the symbol table section.

$ readelf -S example

**snip**
[26] .symtab           SYMTAB           0000000000000000  00003048
       00000000000005d0  0000000000000018          27    44     8
**snip**

There it is! Now let’s strip the binary!

$ strip -p -s example

Now if we load it into GDB, no symbols should be there for us.

$ gdb ./example

pwndbg> p main
No symbol table is loaded.  Use the "file" command.

Also note that I am using pwndbg extensions for GDB, and that is why the prompt looks funny, but all features shown here exist in the vanilla GDB.

Identifying the address of main

One of the most common symbols on an ELF file, is the main() function. We need to examine the stripped ELF to identify it’s main function. However, this process is not as straight-forward as expected, since the first code run is that of _entry, which calls __libc_start_main, passing in as an argument the actual entry point of the main() function. Bingo!

First, we examine the ELF header to identify the entry point.

$ readelf -h example

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401020
  Start of program headers:          64 (bytes into file)
  Start of section headers:          12592 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         11
  Size of section headers:           64 (bytes)
  Number of section headers:         25
  Section header string table index: 24

Aha! Our entry is at 0x401020. Let’s examine that code.

$ objdump -D example -j .text

0000000000401020 <.text>:                                                                      
  401020:       31 ed                   xor    %ebp,%ebp
  401022:       49 89 d1                mov    %rdx,%r9                                        
  401025:       5e                      pop    %rsi                     
  401026:       48 89 e2                mov    %rsp,%rdx                                       
  401029:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  40102d:       50                      push   %rax                               
  40102e:       54                      push   %rsp     
  40102f:       49 c7 c0 70 11 40 00    mov    $0x401170,%r8
  401036:       48 c7 c1 10 11 40 00    mov    $0x401110,%rcx
  40103d:       48 c7 c7 02 11 40 00    mov    $0x401102,%rdi
  401044:       ff 15 a6 2f 00 00       callq  *0x2fa6(%rip)        # 0x403ff0    
  40104a:       f4                      hlt

We observe that a call to what we assume to be __libc_start_main happens, with the argument 0x401102 passed through RDI. If our assumptions are correct, there should be our main().

  401102:       55                      push   %rbp                                                                                                                                            
  401103:       48 89 e5                mov    %rsp,%rbp         
  401106:       b8 00 00 00 00          mov    $0x0,%eax            
  40110b:       5d                      pop    %rbp              
  40110c:       c3                      retq           

Indeed, that is our empty stack frame creation code, and absolutely useless main. Now let us symbolicate the binary with our new-found knowledge.

Symbolicating the binary

Now we need to somehow create a symbol table for our binary with a new entry for main. It’s type should be that of a function, and the entry point of the symbol should be relative to the .text section. Luckily, objcopy can help us, but first we need to calculate how far away is our main from the start of the .text section.

We simply subtract 0x401020 (the start of .text) from 0x401102 (the start of main).

$ python -c "print hex(0x401102-0x401020)"

0xe2

Next, we will use objdump to add a global symbol named main, of type function to our binary.

$ objcopy ./example --add-symbol main=.text:0xe2,function,global ./example-with-symbols

We can now load our new binary into GDB and attempt to print the address of main, or disassemble it, or even break at it.

$ gdb ./example-with-symbols

pwndbg> p main
$1 = {<text variable, no debug info>} 0x401102 <main>

pwndbg> disass main
Dump of assembler code for function main:
   0x0000000000401102 <+0>:     push   rbp
   0x0000000000401103 <+1>:     mov    rbp,rsp
   0x0000000000401106 <+4>:     mov    eax,0x0
   0x000000000040110b <+9>:     pop    rbp
   0x000000000040110c <+10>:    ret  
   
pwndbg> break main
Breakpoint 1 at 0x401106

pwndbg> r
Breakpoint main    

You can repeat the process to symbolicate any handful function. Enjoy your symbolicated binary and pop some shells! :)

Back