Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are Assembly programs almost the same size as C programs

For example: I created a simple C program that prints "Hello, World", compiled it and it created an executable that had a size of 39.8Kb.

following this question I was able to create the equivalent but written in Assembly the size of this program was 39.6Kb.

This surprised me greatly as I expected the assembly program to be smaller than the C program. As the question indicated it uses a C header and the gcc compiler. Would this make the assembly program bigger or is it normal for them to be both roughly the same size?


Using the strip command I reduced both files. This removed debug code and now both have very similar file sizes. Both 18.5Kb.

test.c:

like image 716
Xantium Avatar asked Dec 05 '22 12:12

Xantium


2 Answers

If your hand written code is on par with a compiled function, then sure they are going to be similar in size, they are doing the same thing and if you can compete with a compiler you will be the same or similar.

Now your file sizes indicate you are looking at the wrong thing all together. The file you are looking at while called a binary has a ton of other stuff in it. You want to compare apples to apples in this context then compare the size of the functions, the machine code, not the size of the container that holds the functions plus debug info plus strings plus a number of other things.

Your experiment is flawed but the results very loosely indicate the expected result. But that is if you are producing code in the same way. The odds of that are slim so saying that no you shouldnt expect similar results unless you are producing code in the same way.

take this simple function

unsigned int fun ( unsigned int a, unsigned int b)
{
    return(a+b+1);
}

the same compiler produced this:

00000000 <fun>:
   0:   e52db004    push    {r11}       ; (str r11, [sp, #-4]!)
   4:   e28db000    add r11, sp, #0
   8:   e24dd00c    sub sp, sp, #12
   c:   e50b0008    str r0, [r11, #-8]
  10:   e50b100c    str r1, [r11, #-12]
  14:   e51b2008    ldr r2, [r11, #-8]
  18:   e51b300c    ldr r3, [r11, #-12]
  1c:   e0823003    add r3, r2, r3
  20:   e2833001    add r3, r3, #1
  24:   e1a00003    mov r0, r3
  28:   e28bd000    add sp, r11, #0
  2c:   e49db004    pop {r11}       ; (ldr r11, [sp], #4)
  30:   e12fff1e    bx  lr

and this

00000000 <fun>:
   0:   e2811001    add r1, r1, #1
   4:   e0810000    add r0, r1, r0
   8:   e12fff1e    bx  lr

because of different settings. 13 instructions vs 3, over 4 times larger.

A human might generate this directly from the C, nothing fancy

add r0,r0,r1
add r0,r0,#1
bx lr

not sure from order of operations if you technically have to add the one to b before adding that sum to a. Or if it doesnt matter. I went left to right the compiler went right to left.

so you could say that the compiler and my assembly produced the same number of bytes of binary, or you could say that the compiler produced something over 4 times larger.

Take the above and expand that into a real program that does useful things.

Exercise to the reader (the OP, please dont spoil it) to figure out why the compiler can produce two different correct solutions that are so different in size.

EDIT

.exe, elf and other "binary" formats as mentioned can contain debug information, ascii strings that contain names of functions/labels that make for pretty debug screens. Which are part of the "binary" in that they are part of the baggage but are not machine code nor data used when executing that program, at least not the stuff I am mentioning. You can without changing the machine code nor data the program needs, manipulate the size of your .exe or other file format using compiler settings, so the same compiler-assembler-linker or assembler-linker path can make the binary file in some senses of that word larger or smaller by including or not this additional baggage. So that is part of understanding file sizes and why perhaps even if your hello world programs were different sizes, the overall file might be around the same size, if one is 10 bytes longer but the .exe is 40K then that 10 bytes is in the noise. But if I understand your question, that 10 bytes is what you are interested in knowing how it compares between compiled and hand written C.

Also note that compilers are made by humans, so the output they produce is on par with what at least those humans can produce, other humans can do better, many do worse depending on your definition of better and worse.

like image 105
old_timer Avatar answered Dec 15 '22 01:12

old_timer


the size 39+ Kb absolute not related to compiler and language used (c/c++ or asm) different optimizations, debug information, etc - can change size of this tinny code on say 1000 bytes. but not more. i for test build next program

#include <Windows.h>
#include <stdio.h>
void ep(void*)
{
    ExitProcess(printf("Hello, World"));
}

linker options:

/INCREMENTAL:NO /NOLOGO /MANIFEST:NO /NODEFAULTLIB 
/SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /ENTRY:"ep" /MACHINE:X64 kernel32.lib msvcrt.lib

and got size 2560 bytes exe for both x86/x64.

in what different ? in /NODEFAULTLIB and my version of msvcrt.lib - which is pure import library.

the rest 35kb+ size you give by used static linked c runtime. even if you write program on asm - you need use some lib for link to printf. and your lib containing some code which is static linked with your code. in this code this 35kb.

task is not c++ vs asm - no different here. task in use c-runtime or not use

like image 42
RbMm Avatar answered Dec 15 '22 00:12

RbMm