Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can this Linux / 32bit x86 assembly "Hello, World" be made smaller still?

The following 32bit x86 Linux program prints a string of arbitrary length (as long as a program can be, anyway) and does exit(0) afterwards:

.global _start             ; notice on entry here, all regs but %esp are zero
_start:
    call  .L0              ; offset == strlen, provided by your assembler
.byte 'H','e','l','l','o',',',' ','W','o','r','l','d'
.L0:
    pop   %ecx             ; ret addr is starting addr of string
    mov   -4(%ecx),%edx    ; argument to `call`, 4 bytes: strlen
    inc   %ebx             ; stdout == 1
    movb  $4, %al          ; SYS_write == 4
    int   $0x80
    xchg  %eax,%ebp        ; %ebp is still zero
    xchg  %eax,%ebx        ; SYS_exit == 1, return value == 0
    int   $0x80

If one's willing to sacrifice the position-independence (instead, force the linker to insert the string address), and not care about the program returning zero, one can get it down to:

.global _start
_start:
    movb  $4, %al
    inc   %ebx
    mov   $.L0, %ecx       ; this address is calculated when linking
    movb  $.Lend-.L0, %dl  ; strlen, calculated by assembler
    int   $0x80
    xchg  %eax,%ebx
    int   %0x80
.L0:
.byte 'H','e','l','l','o',',',' ','W','o','r','l','d'
.Lend:

Both of these can be assembled/linked via as --32 -o x.o x.S; ld -s -m elf_i386 x.o, and run just fine. The second one is 26 Bytes of code. If you permit a crash after printing Hello, World then leave the last two instructions out, 23 Bytes. That's as low as I could go.

Question that's always bugged me, is it possible to squeeze a few more bytes off this ? Pure speculation of mine gives these possible leads:

  • Somehow use parts of the 'Hello, World' itself as code ?
  • Anyone knows a usable syscall easter egg ?
  • trick the linker into making the entrypoint a 16-bit address so that movw $.L0, %cx could be used (saves one byte) ?
  • Do an 8-bit offset jmp to a place that's known (or created via assembler / linker invocation magic) to contain the necessary instructions for the exit(...) syscall, saving one byte over the xchg; int sequence ?

Or else, can it be proven that this actually is the smallest well-behaved (no crash / return code zero) Linux/x86 "Hello, World" ?

Edit

To clarify, the question is not about minimizing the size of the ELF executable; techniques for that are long-known. I'm explicitly inquiring about the size of a Linux 32bit x86 assembly program that performs the equivalent of what the compiled code for:

int main(int argc, char **argv)
{
    puts("Hello, World");
    exit(0); /* or whatever code */
}

would do.
In fact, I'll be happy about anything that does not require manual editing of the ELF headers. If you find a way to e.g. stuff the "Hello, World" into some ELF object and referencing that from the assembly source, using only the assembler / linker command line and/or a mapfile input, I'd consider it valid enough, even though that increases the size of the ELF executable. I just want to know if the instruction sequence to print "Hello, World" and exit() afterwards can be shrunk still.
The question is about code size, not executable size.

like image 452
FrankH. Avatar asked Sep 15 '11 16:09

FrankH.


1 Answers

It's been done back in 1999. Have a look at this page (spoiler: the end result is a 45-byte ELF file). Make sure to read the postscript too.

like image 191
Igor Skochinsky Avatar answered Sep 29 '22 18:09

Igor Skochinsky