Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build static ELF without libc using unistd.h from Linux headers

I'm interested in building a static ELF program without (g)libc, using unistd.h provided by the Linux headers.

I've read through these articles/question which give a rough idea of what I'm trying to do, but not quite: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Compiling without libc

https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free

I have basic code which depends only on unistd.h, of which, my understanding is that each of those functions are provided by the kernel, and that libc should not be needed. Here's the path I've taken that seems the most promising:

    $ gcc -I /usr/include/asm/ -nostdlib grabbytes.c -o grabbytesstatic
    /usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
    /tmp/ccn1mSkn.o: In function `main':
    grabbytes.c:(.text+0x38): undefined reference to `open'
    grabbytes.c:(.text+0x64): undefined reference to `lseek'
    grabbytes.c:(.text+0x8f): undefined reference to `lseek'
    grabbytes.c:(.text+0xaa): undefined reference to `read'
    grabbytes.c:(.text+0xc5): undefined reference to `write'
    grabbytes.c:(.text+0xe0): undefined reference to `read'
    collect2: error: ld returned 1 exit status

Before this, I had to manually define SEEK_END and SEEK_SET according to the values found in the kernel headers. Else it would error saying that those were not defined, which makes sense.

I imagine that I need to link into an unstripped vmlinux to provide the symbols to utilize. However, I read through the symbols and while there were plenty of llseeks, they were not llseek verbatim.

So my question can go in a few directions:

How can I specify an ELF file to utilize symbols from? And I'm guessing if/how that's possible, the symbols won't match up. If this is correct, is there an existing header file which will redefine llseek and default_llseek or whatever is exactly in the kernel?

Is there a better way to write Posix code in C without a libc?

My goal is to write or port fairly standard C code using (perhaps solely) unistd.h and invoke it without libc. I'm probably okay without a few unistd functions, and am not sure which ones exist "purely" as kernel calls or not. I love assembly, but that's not my goal here. Hoping to stay as strictly C as possible (I'm fine with a few external assembly files if I have to), to allow for a libc-less static system at some point.

Thank you for reading!

like image 714
sega01 Avatar asked Jan 18 '13 21:01

sega01


People also ask

What is the ELF header in Linux?

This ELF header magic provides information about the file. The first 4 hexadecimal parts define that this is an ELF file (45= E ,4c= L ,46= F ), prefixed with the 7f value. This ELF header is mandatory.

How to analyze ELF files in Linux?

Use the file command to do the first round of analysis. This command may be able to show the details based on header information or magic data. ELF files are for execution or for linking. Depending on the primary goal, it contains the required segments or sections.

What does/usr/bin/elfls do in Linux?

/usr/bin/elfls – shows program headers and section headers with flags /usr/bin/elftoc – converts a binary into a C program /usr/bin/infect – tool to inject a dropper, which creates setuid file in /tmp /usr/bin/objres – creates an object from ordinary or binary data

Why create static executables for Linux?

It is difficult to create distributable executables for Linux because of issues like incompatible C libraries and C++ standard libraries. Creating static executables avoids some of the dependencies, although it may not necessarily help with portability. Static builds are useful for other problems, as discussed in future posts.


2 Answers

If you're looking to write POSIX code in C, the abandonment of libc is not going to be helpful. Although you could implement a syscall function in assembler, and copy structures and defines from the kernel header, you would essentially be writing your own libc, which almost certainly would not be POSIX compliant. With all the great libc implementations out there, there's almost no reason to begin implementing your own.

dietlibc and musl libc are both frugal libc implementations which yield impressively small binaries The linker is generally smart; as long as a library is written to avoid the accidentally pulling in numerous dependencies, only the functions you use will actually be linked into your program.

Here is a simple hello world program:

#include<unistd.h>

int main(){
    char str[] = "Hello, World!\n";
    write(1, str, sizeof str - 1);
    return 0;
}

Compiling it with musl below yeilds a binary of a less than 3K

$ musl-gcc -Os -static hello.c
$ strip a.out 
$ wc -c a.out
2800 a.out

dietlibc produces an even smaller binary, less than 1.5K:

$ diet -Os gcc hello.c
$ strip a.out 
$ wc -c a.out
1360 a.out
like image 138
Dave Avatar answered Oct 13 '22 01:10

Dave


This is far from ideal, but a little bit of (x86_64) assembler has me down to just under 5KB (but most of that is "other things than code" - the actual code is under 1KB [771 bytes to be precise], but the file size is much larger, I think because the code size is rounded to 4KB, and then some header/footer/extra stuff is added to that]

Here's what I did: gcc -g -static -nostdlib -o glibc start.s glibc.c -Os -lc

glibc.c contains:

#include <unistd.h>

int main()
{
    const char str[] = "Hello, World!\n";
    write(1, str, sizeof(str));

    _exit(0);
}

start.s contains:

    .globl _start
_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    mov %rsp, %rdx
    and $~16, %rsp
    push    $0
    push    %rsp

    call    main

    hlt


    .globl _exit
_exit:
    //  We known %RDI already has the exit code... 
    mov $0x3c, %eax
    syscall
    hlt

That main point of this is not to show that it's not the system call part of glibc that takes up a lot of space, but the "prepar things" - and beware that if you were to call for example printf, possibly even (v)sprintf, or exit(), or any other "standard library" function, you are in the land of "nobody knows what will happen".

Edit: Updated "start.s" to put argc/argv in the right places:

_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    pop     %rdi
    mov %rsp, %rsi
    and $~16, %rsp
    push    %rax
    push    %rsp

    // %rdi = argc, %rsi=argv
    call    main

Note that I've changed which register contains what thing, so that it matches main - I had them slightly wrong order in the previous code.

like image 31
Mats Petersson Avatar answered Oct 13 '22 03:10

Mats Petersson