Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a reasonable minimum number of assembly instructions for a small C program including setup?

I'm trying to generate the smallest C program possible to see how many instructions are executed by running it. I disabled use of libraries and disabled vdso. Yet, my C program, which gdb says is 7 assembly instructions, ends up executing 17k instructions according to perf stat.

Is this a normal amount of instructions just to set up the program? According to gdb, code from ld-linux-x86-64.so.2 is mapped into the program address space. Given that I disabled vdso and am including no libraries, is this file necessary to run the program? Could this be the reason for the 17k instructions?

My C program foo5.c

int main(){
    char* str = "Hello World";
    return 0;
}

How I compile:

gcc -nostdlib -nodefaultlibs stubstart.S -o foo5 foo5.c

stubstart.S

.globl _start
_start:call main;
    movl $1, %eax; 
    xorl %ebx, %ebx; 
    int $0x80

perf stat output:

Performance counter stats for './foo5':

              0.60 msec task-clock:u              #    0.015 CPUs utilized          
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
                11      page-faults:u             #    0.018 M/sec                  
            46,646      cycles:u                  #    0.077 GHz                    
            17,224      instructions:u            #    0.37  insn per cycle         
             5,145      branches:u                #    8.513 M/sec                  
               435      branch-misses:u           #    8.45% of all branches  

gdb program layout:

`/home/foo5', file type elf64-x86-64.
    Entry point: 0x5555555542b1
    0x0000555555554238 - 0x0000555555554254 is .interp
    0x0000555555554254 - 0x0000555555554278 is .note.gnu.build-id
    0x0000555555554278 - 0x0000555555554294 is .gnu.hash
    0x0000555555554298 - 0x00005555555542b0 is .dynsym
    0x00005555555542b0 - 0x00005555555542b1 is .dynstr
    0x00005555555542b1 - 0x00005555555542d5 is .text
    0x00005555555542d5 - 0x00005555555542e1 is .rodata
    0x00005555555542e4 - 0x00005555555542f8 is .eh_frame_hdr
    0x00005555555542f8 - 0x0000555555554330 is .eh_frame
    0x0000555555754f20 - 0x0000555555755000 is .dynamic
    0x00007ffff7dd51c8 - 0x00007ffff7dd51ec is .note.gnu.build-id in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd51f0 - 0x00007ffff7dd52c4 is .hash in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd52c8 - 0x00007ffff7dd53c0 is .gnu.hash in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd53c0 - 0x00007ffff7dd56f0 is .dynsym in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd56f0 - 0x00007ffff7dd5914 is .dynstr in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5914 - 0x00007ffff7dd5958 is .gnu.version in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5958 - 0x00007ffff7dd59fc is .gnu.version_d in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5a00 - 0x00007ffff7dd5dd8 is .rela.dyn in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5dd8 - 0x00007ffff7dd5e80 is .rela.plt in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5e80 - 0x00007ffff7dd5f00 is .plt in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5f00 - 0x00007ffff7dd5f08 is .plt.got in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5f10 - 0x00007ffff7df4b20 is .text in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df4b20 - 0x00007ffff7df9140 is .rodata in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df9140 - 0x00007ffff7df9141 is .stapsdt.base in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df9144 - 0x00007ffff7df97b0 is .eh_frame_hdr in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df97b0 - 0x00007ffff7dfbc24 is .eh_frame in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffc680 - 0x00007ffff7ffce64 is .data.rel.ro in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffce68 - 0x00007ffff7ffcfd8 is .dynamic in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffcfd8 - 0x00007ffff7ffcfe8 is .got in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffd000 - 0x00007ffff7ffd050 is .got.plt in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffd060 - 0x00007ffff7ffdfd8 is .data in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffdfe0 - 0x00007ffff7ffe170 is .bss in /lib64/ld-linux-x86-64.so.2

UPDATE:

In the end, jester's comment about creating a standard executable instead of a PIE to remove the ld.so by adding the -no-pie flag to gcc reduced the perf instruction stat to 12. Then old_timer's -O2 suggestion further reduced it to 7! Thank you everyone.

UPDATE 2: The selected answer of using -static also reduces the instruction count from 17k to 12. Excellent answer.

Also this article linked by commenters is relevant and entertaining.

like image 967
wxz Avatar asked Mar 06 '20 19:03

wxz


People also ask

How many minimum operand are required for an instruction to be executed in assembly language programming?

Theoretically, a single instruction computer is possible. However on real hardware, you would need a minimum of 4.

What is an instruction in assembly language?

An instruction is a statement that is executed at runtime. An x86 instruction statement can consist of four parts: Label (optional) Instruction (required) Operands (instruction specific)

What does MOV mean in assembly?

mov — Move (Opcodes: 88, 89, 8A, 8B, 8C, 8E, ...) The mov instruction copies the data item referred to by its second operand (i.e. register contents, memory contents, or a constant value) into the location referred to by its first operand (i.e. a register or memory).


1 Answers

TL:DR: -static is not the default, use that to make an ELF executable that only runs your _start.

-no-pie -nostdlib will also make a static executable simply because it's non-PIE and there are no dynamic libraries to link.

There also is such a thing as -static-pie where the kernel will load your executable to a randomized base address but not run ld.so first (I think), but that's not what you get with -static.


Just to be clear, we're talking about the dynamic instruction count (how many are actually executed in user-space, perf stat -e instructions:u), not a static count (how many are sitting on disk / in memory as part of the executable). A static count only counts instructions inside loops once, and still counts instructions that never execute.

Or at least that's what I'm answering. That makes metadata in other sections, and code that doesn't execute irrelevant.

According to gdb, code from ld-linux-x86-64.so.2 is mapped into the program address space. Given that I disabled vdso and am including no libraries, is this file necessary to run the program?

You still built a position-independent executable (PIE). This is an ELF shared object with an entry point, so it's still dynamically linked. So the ld.so ELF interpreter runs on it. There's nothing for it to do because you don't actually use any shared libraries, but 17k user-space instructions sounds about right. I get 32606 or 7 instructions for your program on my Arch Linux system (glibc 2.31).

ld.so is started as an "interpreter" for your binary in a similar way to how /bin/sh is started to interpret an executable text file that starts with #!/bin/sh. (Although Linux's ELF program loader still does some of the work of mapping program segments into memory according to the program header of the executable, so ld.so doesn't have to do that manually with system calls.)

You can see this by running under gdb ./foo5 and using starti instead of run to stop before the first user-space instruction. You'll see that you're in ld.so's _start.

Reading symbols from ./foo5...
(No debugging symbols found in ./foo5)
Cannot access memory at address 0x1024   ### note this isn't a real address,
                     ### just an offset relative to the base address / start of the file.
                     ### That's another clue this is a PIE
(gdb) starti

Program stopped.
0x00007ffff7fd3100 in _start () from /lib64/ld-linux-x86-64.so.2

You can also run strace ./foo5 to see the system calls it makes, as an indication that there's a bunch of stuff happening:

$ strace ./foo5
execve("./foo5", ["./foo5"], 0x7ffc12394d90 /* 50 vars */) = 0
brk(NULL)                               = 0x55741b4b7000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffca69312b0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d4fc4b000
arch_prctl(ARCH_SET_FS, 0x7f1d4fc4ba80) = 0
mprotect(0x557419622000, 4096, PROT_READ) = 0
strace: [ Process PID=303809 runs in 32 bit mode. ]
exit(0)                                 = ?

(Note the "runs in 32 bit mode"; it doesn't, but strace detected that you used the 32-bit int $0x80 ABI instead of the normal syscall ABI that ld.so used.)


Use -static

-nostdlib used to imply -static, in GCC configured to not make PIEs by default. But modern distros do configure GCC to make PIEs for security reasons. See 32-bit absolute addresses no longer allowed in x86-64 Linux?

$ file foo5
foo5: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1ac0a9af247fefebde100695805e5b73f06e891c, not stripped

After building with -static, OTOH:

$ file foo5
foo5: ELF 64-bit LSB executable ...
$ perf stat --all-user ./foo5

 Performance counter stats for './foo5':

              0.03 msec task-clock                #    0.151 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                 1      page-faults               #    0.030 M/sec                  
             1,930      cycles                    #    0.058 GHz                    
                12      instructions              #    0.01  insn per cycle         
                 4      branches                  #    0.121 M/sec                  
                 0      branch-misses             #    0.00% of all branches        

       0.000219151 seconds time elapsed

       0.000284000 seconds user
       0.000000000 seconds sys

(Odd that perf doesn't print :u for the events when you use --all-user. My system has /proc/sys/kernel/perf_event_paranoid = 0 so if I don't use that, it also counts instructions executed inside the kernel. That varies significantly from run to run, but around 60k total for this static executable.)

I only count 11 user-space instructions that execute, but apparently my i7-6700k counts 12 for that event. (There is hardware support for masking user, kernel, or both for any event counter. This is what perf uses.)

GDB also confirms success:

Reading symbols from ./foo5...
(No debugging symbols found in ./foo5)
Cannot access memory at address 0x401024
(gdb) starti
Starting program: /tmp/foo5

Program stopped.
0x0000000000401000 in _start ()
(gdb) 

And the disassembly window from layout reg shows:

│  >0x401000 <_start>       call   0x40100e <main>
│   0x401005 <_start+5>     mov    eax,0x1
│   0x40100a <_start+10>    xor    ebx,ebx
│   0x40100c <_start+12>    int    0x80
│   0x40100e <main>         push   rbp
│   0x40100f <main+1>       mov    rbp,rsp
│   0x401012 <main+4>       lea    rax,[rip+0xfe7]        # 0x402000
│   0x401019 <main+11>      mov    QWORD PTR [rbp-0x8],rax
│   0x40101d <main+15>      mov    eax,0x0
│   0x401022 <main+20>      pop    rbp
│   0x401023 <main+21>      ret

You could have compiled with -O2 to optimize your main down to just an xor eax,eax / ret, or not call it at all so only 3 user-space instructions had to execute.

Or to optimize your user-space instruction count while still using C, see @mosvy's answer about writing _start in C, and an inline asm _exit(2) that can inline into it.)

Note that your _start fails to pass argc and argv to main, although it does have RSP properly 16-byte aligned before a function call. (Because the x86-64 SysV ABI guarantees process entry happens with the stack aligned). You could do that with a mov load and an LEA. Note that since you don't initialize libc, even if you statically linked libc you couldn't call its functions.

See How Get arguments value using inline assembly in C without Glibc? for some hacks. (Basically stand-alone asm _start written in an asm() statement at global scope, or my answer is a total hack on the calling convention.)

like image 98
Peter Cordes Avatar answered Sep 28 '22 10:09

Peter Cordes