The Trials and Tribulations of Manual GCC Compilation

So I’ve been exploring the gnu c toolchain and the algorithms underlying it. One of the things I did initially while exploring the toolchain was to simply do a lot of gcc tasks manually.

The goal iss to compile this incredibly simple C program.

#include <stdio.h>

int main (int argc, char ** argv)
{
        int age = 27;
        printf("My age is %d\n", age);

        return 0;
}

I am avoiding the use of the entire gcc command in one step to explore how various options link together, but for whatever reason I couldn’t initially get the binary output to run after linking. I used the link command and specified the entry point in the assembly (.s file) correctly.

cpp main.c ppmain.c    // GNU C Preprocessor
gcc -S -fverbose-asm ppmain.c -o asmain.s    // GNU Compilation To Assembly
gcc -c asmain.s -o objmain.o    // To obj code
ld -o main objmain.o --entry main -lc    // Linking

This follows the typical compiler toolchain order of

  1. Preprocessing
  2. Compiling To Assembly Code
  3. Assembling To Object Code
  4. Linking

I got no warnings and it was clearly executable since ls -la outputs

-rwxrwxr-x 1 kevin kevin 14632 May  9 03:16 main

I still did chmod +x just to see if linux was bullshitting me, but still no luck.

After some digging, I found a 2002 article on linkers and loaders and saw the term “loader” in the title. I’d never heard of a “loader” before, but later I realized that it was talking about a dynamic loader.

So I continued looking online and couldn’t find much advice to assist me, so the only thing left was for me to inspect the binary file itself. The common standard file format for executable files is called Executable and Linkable Format, or ELF for short. To inspect a binary, you can use the readelf command. The -h option allows me restrict the displayed output to the information in the header at the beginning of the ELF file.

$ readelf -h main

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401030
  Start of program headers:          64 (bytes into file)
  Start of section headers:          13288 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         11
  Size of section headers:           64 (bytes)
  Number of section headers:         21
  Section header string table index: 20

To explain why the above information is important, it is necessary to describe what a linker does. The first thing to understand is that there are usually two linkers in reality: a static linker and a dynamic linker. In C, the static libraries end with .a (stands for archive) and the dynamic libraries with .so (stands for shared object). The former are linked during the build process and the latter are linked during run-time. The reason you would want dynamic libraries is to allow you to change code in a library without having to recompile the main program itself.

But what does it mean “to link”? Why is it called “linking”? Many programmers mistakenly think that the linking process is essentially like an include but later in the build process, but that isn’t technically true. The C Preprocessor includes source code that was referred to by an #include statement at the beginning of the build process by literally copying and pasting the contents of the non-system header files into the pre-compiled outfile. The reality is that the so-called linker is more accurately described as a bookkeeper.

Prior to linking, the object file is completely ignorant of where any piece of code lives in memory. What the linker actually does is assign memory locations to code and data. And then the linker writes the memory locations of that code into the binary file where it is located in the object file. The linker essentially makes the object file aware of where its code is actually located. It “links” the memory locations together by writing the missing address information into the final executable file; hence, the name “linker”.

So to see whether a linker did its job, the first thing we need to look at is whether it actually assigned a memory address for the start of the program. As you can see from the output above, the entry point address assigned to the beginning of my program is 0x401030. So far so good.

But then I looked at the segments and at first glance it seemed like the linker had done everything right. I was at a loss.

$ readelf --segments main

Elf file type is EXEC (Executable file)
Entry point 0x401030
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000004002a8 0x00000000004002a8
                 0x000000000000000f 0x000000000000000f  R      0x1
      [Requesting program interpreter: /lib/ld64.so.1]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000003a0 0x00000000000003a0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x0000000000000067 0x0000000000000067  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000402000 0x0000000000402000
                 0x0000000000000084 0x0000000000000084  R      0x1000
  LOAD           0x0000000000002eb0 0x0000000000403eb0 0x0000000000403eb0
                 0x0000000000000170 0x0000000000000170  RW     0x1000
  DYNAMIC        0x0000000000002eb0 0x0000000000403eb0 0x0000000000403eb0
                 0x0000000000000150 0x0000000000000150  RW     0x8
  NOTE           0x00000000000002b8 0x00000000004002b8 0x00000000004002b8
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_PROPERTY   0x00000000000002b8 0x00000000004002b8 0x00000000004002b8
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002eb0 0x0000000000403eb0 0x0000000000403eb0
                 0x0000000000000150 0x0000000000000150  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.gnu.property .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.plt 
   03     .plt .plt.sec .text 
   04     .rodata .eh_frame 
   05     .dynamic .got.plt 
   06     .dynamic 
   07     .note.gnu.property 
   08     .note.gnu.property 
   09     
   10     .dynamic 

But after a little digging, I found a glaring issue. The problem is this line right here:

[Requesting program interpreter: /lib/ld64.so.1]

When I tried to find this file, the command line returned the following:

$ ls -l  /lib/ld64.so.1
ls: cannot access '/lib/ld64.so.1': No such file or directory

I went to the /lib directory just to make sure and confirmed that the file wasn’t there. What was this /lib/ld64.so.1 file supposed to be? Well unless a C program is linked with the -static option and also depends on no libraries, every C program requires a dynamic linker in order to run. It turns out the linker thought that /lib/ld64.so.1 was my interpreter because that is the default. GCC would usually be able to correct this by itself. I wouldn’t have figured this out without actually inspecting the binary itself. Instead, what my main binary should be asking for is

/lib/ld-linux.so.2

So how do we fix this problem without caving in and turning to gcc witchcraft?

It turns out that ld has a –dynamic-linker option that allows you to specify the correct dynamic linker. The man page for ld does warn against using this option.

The default dynamic linker is normally correct; don’t use this unless you know what you are doing.

When I read this, I said what anyone should: fuck you, this is america.

I reran the gnu linker command with the correct dynamic linker and…

$ ld -o main objmain.o --dynamic-linker /lib64/ld-linux-x86-64.so.2 --entry main -lc
$ ./main
My age is 27
Segmentation fault (core dumped)

I got the correct output (great sign), but there is an unexpected core dump occurring. I tried every gcc warning under the sun and so I know my code meets C ISO standards and there isn’t a problem with the buffer because I tried that check too and even did a bunch of flushing. There was still something that GCC was doing to prevent the segmentation fault.

I don’t think it is occuring in the optimization stage, since the default gcc optimization level 0, which means that it is only going to do the bare minimum optimizations to reduce the cost of compilation. I reduced the program down to just the printf and I’m still getting the segmentation faults. I flushed out the buffers on both sides of the printf call and still have segmentation faults.

Whatever GCC is doing to fix this problem remains a mystery to me, but I hope to figure it out soon. If anyone has any ideas, then let me know.

2 Likes

Is the shutdown included? I believe that’s usually a call to exit.

You know what? I didnt think about that. Assembly files generated with defaults use atypical names for sections of the assmebly code that most compilers don’t expect. For example, not including “–entry main” in the options causes a compiler error in gcc despite the assembly being code generated by the gcc compiler. I would not be shocked if the x86 code had atypicalities elsewhere.

Thanks for bringing that up, @Brian . I’ll let you know if it goes anywhere!

1 Like

this is a fantastic rabbit hole. earmarking this to read on my day off. ty for sharing

1 Like

Im glad that people like you enoy it. I wanted to do more software threads on talk to appeal to those who express interest in computers. Ill add more to this thread to talk about discoveries I made trying to solve this new segfault issue.

1 Like