Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"C variable type sizes are machine dependent." Is it really true? signed & unsigned numbers ;

Tags:

c

types

assembly

I've been told that C types are machine dependent. Today I wanted to verify it.

void legacyTypes()
{
    /* character types */
    char k_char = 'a';

        //Signedness --> signed & unsigned
        signed char k_char_s = 'a';
        unsigned char k_char_u = 'a';

    /* integer types */
    int k_int = 1; /* Same as "signed int" */

        //Signedness --> signed & unsigned
        signed int k_int_s = -2;
        unsigned int k_int_u = 3;

        //Size --> short, _____,  long, long long
        short int k_s_int = 4;
        long int k_l_int = 5;
        long long int k_ll_int = 6;

    /* real number types */
        float k_float = 7;
        double k_double = 8;
}

I compiled it on a 32-Bit machine using minGW C compiler

_legacyTypes:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $48, %esp
    movb    $97, -1(%ebp)  # char
    movb    $97, -2(%ebp)  # signed char
    movb    $97, -3(%ebp)  # unsigned char
    movl    $1, -8(%ebp)    # int
    movl    $-2, -12(%ebp)# signed int 
    movl    $3, -16(%ebp) # unsigned int
    movw    $4, -18(%ebp) # short int
    movl    $5, -24(%ebp) # long int
    movl    $6, -32(%ebp) # long long int
    movl    $0, -28(%ebp) 
    movl    $0x40e00000, %eax
    movl    %eax, -36(%ebp)
    fldl    LC2
    fstpl   -48(%ebp)
    leave
    ret

I compiled the same code on 64-Bit processor (Intel Core 2 Duo) on GCC (linux)

legacyTypes:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movb    $97, -1(%rbp) # char
    movb    $97, -2(%rbp) # signed char
    movb    $97, -3(%rbp) # unsigned char
    movl    $1, -12(%rbp) # int
    movl    $-2, -16(%rbp)# signed int 
    movl    $3, -20(%rbp) # unsigned int
    movw    $4, -6(%rbp)   # short int
    movq    $5, -32(%rbp) # long int
    movq    $6, -40(%rbp) # long long int
    movl    $0x40e00000, %eax
    movl    %eax, -24(%rbp)
    movabsq $4620693217682128896, %rax
    movq    %rax, -48(%rbp)
    leave
    ret

Observations

  • char, signed char, unsigned char, int, unsigned int, signed int, short int, unsigned short int, signed short int all occupy same no. of bytes on both 32-Bit & 64-Bit Processor.

  • The only change is in long int & long long int both of these occupy 32-bit on 32-bit machine & 64-bit on 64-bit machine.

  • And also the pointers, which take 32-bit on 32-bit CPU & 64-bit on 64-bit CPU.

Questions:

  • I cannot say, what the books say is wrong. But I'm missing something here. What exactly does "Variable types are machine dependent mean?"
  • As you can see, There is no difference between instructions for unsigned & signed numbers. Then how come the range of numbers that can be addressed using both is different?
  • I was reading How to maintain fixed size of C variable types over different machines? I didn't get the purpose of the question or their answers. What maintaining fixed size? They all are the same. I didn't understand how those answers are going to ensure the same size.

EDIT:

Isn't it impossible to provide same size over different machines? I mean, how can one maintain same pointer size on both 64-bit & 32-bit machine?

like image 528
claws Avatar asked Nov 29 '22 06:11

claws


2 Answers

There are a lot more platforms out there, and some of them are 16 or even 8 bit! On these, you would observe much bigger differences in the sizes of all the above types.

Signed and unsigned versions of the same basic type occupy the same number of bytes on any platform, however their range of numbers is different since for a signed number the same range of possible values is shared between the signed and unsigned realm.

E.g. a 16 bit signed int can have values from -32767 (or -32768 on many platforms) to 32767. An unsigned int of the same size is in the range 0 to 65535.

After this, hopefully you understand the point of the referred question better. Basically if you write a program assuming that e.g. your signed int variables will be able to hold the value 2*10^9 (2 billion), your program is not portable, because on some platforms (16 bits and below) this value will cause an overflow, resulting in silent and hard to find bugs. So e.g. on a 16 bit platform you need to #define your ints to be long in order to avoid overflow. This is a simple example, which may not work across all platforms, but I hope it gives you a basic idea.

The reason for all these differences between platforms is that by the time C got standardized, there was already many C compilers used on a plethora of different platforms, so for backward compatibility, all these varieties had to be accepted as valid.

like image 158
Péter Török Avatar answered Dec 04 '22 01:12

Péter Török


Machine dependent is not quite exact. Actually, it's implementation-defined. It may depend on compiler, machine, compiler options etc.

For example, using Visual C++, long would be 32 bit even on 64 bit machines.

like image 42
oefe Avatar answered Dec 04 '22 03:12

oefe