I have a NASM assembly file that I am assembling and linking (on Intel-64 Linux). There is a text file, and I want the contents of the text file to appear in the resulting binary (as a string, basically). The binary is an ELF executable. My plan is to create a new readonly data section in the ELF file (equivalent to the conventional <code>.rodata</code> section). Ideally, there would be a tool to add a file verbatim as a new section in an elf file, or a linker option to include a file verbatim. Is this possible?

This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program. OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name. Assume we have an input file called <code>myfile.txt</code> with the contents: <pre class="prettyprint"><code>the quick brown fox jumps over the lazy dog </code></pre> Something like this would be a starting point: <pre class="prettyprint"><code>objcopy --input binary \ --output elf64-x86-64 \ --binary-architecture i386:x86-64 \ myfile.txt myfile.o </code></pre> If you wanted to generate 32-bit objects you could use: <pre class="prettyprint"><code>objcopy --input binary \ --output elf32-i386 \ --binary-architecture i386 \ myfile.txt myfile.o </code></pre> The output would be an object file called <code>myfile.o</code> . If we were to review the headers of the object file using OBJDUMP and a command like <code>objdump -x myfile.o</code> we would see something like this: <pre class="prettyprint"><code>myfile.o: file format elf64-x86-64 myfile.o architecture: i386:x86-64, flags 0x00000010: HAS_SYMS start address 0x0000000000000000 Sections: Idx Name Size VMA LMA File off Algn 0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, DATA SYMBOL TABLE: 0000000000000000 l d .data 0000000000000000 .data 0000000000000000 g .data 0000000000000000 _binary_myfile_txt_start 000000000000002c g .data 0000000000000000 _binary_myfile_txt_end 000000000000002c g *ABS* 0000000000000000 _binary_myfile_txt_size </code></pre> By default it creates a <code>.data</code> section with contents of the file and it creates a number of symbols that can be used to reference the data. <pre class="prettyprint"><code>_binary_myfile_txt_start _binary_myfile_txt_end _binary_myfile_txt_size </code></pre> This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file <code>myfile.txt</code>. OBJCOPY will base the symbols on the input file name. <code>myfile.txt</code> is mangled into <code>myfile_txt</code> and used to create the symbols. One problem is that a <code>.data</code> section is created which is read/write/data as seen here: <pre class="prettyprint"><code>Idx Name Size VMA LMA File off Algn 0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, DATA </code></pre> You specifically are requesting a <code>.rodata</code> section that would also have the READONLY flag specified. You can use the <code>--rename-section</code> option to change <code>.data</code> to <code>.rodata</code> and specify the needed flags. You could add this to the command line: <pre class="prettyprint"><code>--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA </code></pre> Of course if you want to call the section something other than <code>.rodata</code> with the same flags as a read only section you can change <code>.rodata</code> in the line above to the name you want to use for the section. The final version of the command that should generate the type of object you want is: <pre class="prettyprint"><code>objcopy --input binary \ --output elf64-x86-64 \ --binary-architecture i386:x86-64 \ --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \ myfile.txt myfile.o </code></pre> Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki: <blockquote> A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss. The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced) </blockquote> Keeping this in mind we could create this C file called <code>main.c</code>: <pre class="prettyprint"><code>#include <stdint.h> #include <stdlib.h> #include <stdio.h> /* These are external references to the symbols created by OBJCOPY */ extern char _binary_myfile_txt_start[]; extern char _binary_myfile_txt_end[]; extern char _binary_myfile_txt_size[]; int main() { char *data_start = _binary_myfile_txt_start; char *data_end = _binary_myfile_txt_end; size_t data_size = (size_t)_binary_myfile_txt_size; /* Print out the pointers and size */ printf ("data_start %p\n", data_start); printf ("data_end %p\n", data_end); printf ("data_size %zu\n", data_size); /* Print out each byte until we reach the end */ while (data_start < data_end) printf ("%c", *data_start++); return 0; } </code></pre> You can compile and link with: <pre class="prettyprint"><code>gcc -O3 main.c myfile.o </code></pre> The output should look something like: <pre class="prettyprint"><code>data_start 0x4006a2 data_end 0x4006ce data_size 44 the quick brown fox jumps over the lazy dog </code></pre> <hr> A NASM example of usage is similar in nature to the C code. The following assembly program called <code>nmain.asm</code> writes the same string to standard output using Linux x86-64 System Calls: <pre class="prettyprint"><code>bits 64 global _start extern _binary_myfile_txt_start extern _binary_myfile_txt_end extern _binary_myfile_txt_size section .text _start: mov eax, 1 ; SYS_Write system call mov edi, eax ; Standard output FD = 1 mov rsi, _binary_myfile_txt_start ; Address to start of string mov rdx, _binary_myfile_txt_size ; Length of string syscall xor edi, edi ; Return value = 0 mov eax, 60 ; SYS_Exit system call syscall </code></pre> This can be assembled and linked with: <pre class="prettyprint"><code>nasm -f elf64 -o nmain.o nmain.asm gcc -m64 -nostdlib nmain.o myfile.o </code></pre> The output should appear as: <pre class="prettyprint"><code>the quick brown fox jumps over the lazy dog </code></pre>

How do I add contents of text file as a section in an ELF file?

1 Answers

This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program.

OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name.

Assume we have an input file called myfile.txt with the contents:

the
quick
brown
fox
jumps
over
the
lazy
dog

Something like this would be a starting point:

objcopy --input binary \
    --output elf64-x86-64 \
    --binary-architecture i386:x86-64 \
    myfile.txt myfile.o

If you wanted to generate 32-bit objects you could use:

objcopy --input binary \
    --output elf32-i386 \
    --binary-architecture i386 \
    myfile.txt myfile.o

The output would be an object file called myfile.o . If we were to review the headers of the object file using OBJDUMP and a command like objdump -x myfile.o we would see something like this:

myfile.o:     file format elf64-x86-64
myfile.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         0000002c  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 g       .data  0000000000000000 _binary_myfile_txt_start
000000000000002c g       .data  0000000000000000 _binary_myfile_txt_end
000000000000002c g       *ABS*  0000000000000000 _binary_myfile_txt_size

By default it creates a .data section with contents of the file and it creates a number of symbols that can be used to reference the data.

_binary_myfile_txt_start
_binary_myfile_txt_end
_binary_myfile_txt_size

This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file myfile.txt. OBJCOPY will base the symbols on the input file name. myfile.txt is mangled into myfile_txt and used to create the symbols.

One problem is that a .data section is created which is read/write/data as seen here:

Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         0000002c  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA

You specifically are requesting a .rodata section that would also have the READONLY flag specified. You can use the --rename-section option to change .data to .rodata and specify the needed flags. You could add this to the command line:

--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA

Of course if you want to call the section something other than .rodata with the same flags as a read only section you can change .rodata in the line above to the name you want to use for the section.

The final version of the command that should generate the type of object you want is:

objcopy --input binary \
    --output elf64-x86-64 \
    --binary-architecture i386:x86-64 \
    --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \
    myfile.txt myfile.o

Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki:

A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss.

The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced)

Keeping this in mind we could create this C file called main.c:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

/* These are external references to the symbols created by OBJCOPY */
extern char _binary_myfile_txt_start[];
extern char _binary_myfile_txt_end[];
extern char _binary_myfile_txt_size[];

int main()
{
    char *data_start     = _binary_myfile_txt_start;
    char *data_end       = _binary_myfile_txt_end;
    size_t data_size  = (size_t)_binary_myfile_txt_size;

    /* Print out the pointers and size */
    printf ("data_start %p\n", data_start);
    printf ("data_end   %p\n", data_end);
    printf ("data_size  %zu\n", data_size);

    /* Print out each byte until we reach the end */
    while (data_start < data_end)
        printf ("%c", *data_start++);

    return 0;
}

You can compile and link with:

gcc -O3 main.c myfile.o

The output should look something like:

data_start 0x4006a2
data_end   0x4006ce
data_size  44
the
quick
brown
fox
jumps
over
the
lazy
dog

A NASM example of usage is similar in nature to the C code. The following assembly program called nmain.asm writes the same string to standard output using Linux x86-64 System Calls:

bits 64
global _start

extern _binary_myfile_txt_start
extern _binary_myfile_txt_end
extern _binary_myfile_txt_size

section .text

_start:
    mov eax, 1                        ; SYS_Write system call
    mov edi, eax                      ; Standard output FD = 1
    mov rsi, _binary_myfile_txt_start ; Address to start of string
    mov rdx, _binary_myfile_txt_size  ; Length of string
    syscall

    xor edi, edi                      ; Return value = 0
    mov eax, 60                       ; SYS_Exit system call
    syscall

This can be assembled and linked with:

nasm -f elf64 -o nmain.o nmain.asm
gcc -m64 -nostdlib nmain.o myfile.o

The output should appear as:

the
quick
brown
fox
jumps
over
the
lazy
dog

answered Oct 12 '22 09:10

Michael Petch

Related questions
                            
                                How to selectively link certain system libraries statically into Haskell program binary?
                            
                                g++ error using -flto option
                            
                                How does adding a private member variable break C++ ABI compatibility?
                            
                                when I use strlcpy function in c the compilor give me an error
                            
                                How to find a library with cmake?
                            
                                Injecting sections into GNU ld script; script compatibility between versions of binutils.
                            
                                Visual C++ Library Directories Command Line equivalent
                            
                                C the same global variable defined in different files
                            
                                Linker error when compiling boost.asio example
                            
                                `bash: ./a.out: No such file or directory` on running executable produced by `ld`
                            
                                Is it possible to use environment variables in a cgo CFLAGS comment?
                            
                                error LNK2005: already defined - C++
                            
                                General troubleshooting technique for undefined symbols - gcc
                            
                                How to deal with recursive dependencies between static libraries using the binutils linker?
                            
                                C and C++ static linking: just a copy?
                            
                                How to force using local shared libraries over system libraries?
                            
                                SWIG and C++ shared library
                            
                                Linker error on Linux: "undefined reference to"
                            
                                Xcode keeps searching dylib at wrong path
                            
                                How do I create both a .lib file and an .exe file in Visual C++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I add contents of text file as a section in an ELF file?

Tags:

x86

linker

intel

elf

nasm

objcopy

David Jones

People also ask

1 Answers

Michael Petch

Recent Activity

Donate For Us