Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC Inline-Assembly Error: "Operand size mismatch for 'int'"

first, if somebody knows a function of the Standard C Library, that prints a string without looking for a binary zero, but requires the number of characters to draw, please tell me!

Otherwise, I have this problem:

void printStringWithLength(char *str_ptr, int n_chars){

asm("mov 4, %rax");//Function number (write)
asm("mov 1, %rbx");//File descriptor (stdout)
asm("mov $str_ptr, %rcx");
asm("mov $n_chars, %rdx");
asm("int 0x80");
return;

}

GCC tells the following error to the "int" instruction:

"Error: operand size mismatch for 'int'"

Can somebody tell me the issue?

like image 847
toskana98 Avatar asked Dec 08 '22 17:12

toskana98


2 Answers

There are a number of issues with your code. Let me go over them step by step.

First of all, the int $0x80 system call interface is for 32 bit code only. You should not use it in 64 bit code as it only accepts 32 bit arguments. In 64 bit code, use the syscall interface. The system calls are similar but some numbers are different.

Second, in AT&T assembly syntax, immediates must be prefixed with a dollar sign. So it's mov $4, %rax, not mov 4, %rax. The latter would attempt to move the content of address 4 to rax which is clearly not what you want.

Third, you can't just refer to the names of automatic variables in inline assembly. You have to tell the compiler what variables you want to use using extended assembly if you need any. For example, in your code, you could do:

asm volatile("mov $4, %%eax; mov $1, %%edi; mov %0, %%esi; mov %2, %%edx; syscall"
    :: "r"(str_ptr), "r"(n_chars) : "rdi", "rsi", "rdx", "rax", "memory");

Fourth, gcc is an optimizing compiler. By default it assumes that inline assembly statements are like pure functions, that the outputs are a pure function of the explicit inputs. If the output(s) are unused, the asm statement can be optimized away, or hoisted out of loops if run with the same inputs.

But a system call like write has a side-effect you need the compiler to keep, so it's not pure. You need the asm statement to run the same number of times and in the same order as the C abstract machine would. asm volatile will make this happen. (An asm statement with no outputs is implicitly volatile, but it's good practice to make it explicit when the side effect is the main purpose of the asm statement. Plus, we do want to use an output operand to tell the compiler that RAX is modified, as well as being an input, which we couldn't do with a clobber.)

You do always need to accurately describe your asm's inputs, outputs, and clobbers to the compiler using Extended inline assembly syntax. Otherwise you'll step on the compiler's toes (it assumes registers are unchanged unless they're outputs or clobbers). (Related: How can I indicate that the memory *pointed* to by an inline ASM argument may be used? shows that a pointer input operand alone does not imply that the pointed-to memory is also an input. Use a dummy "m" input or a "memory" clobber to force all reachable memory to be in sync.)

You should simplify your code by not writing your own mov instructions to put data into registers but rather letting the compiler do this. For example, your assembly becomes:

ssize_t retval;
asm volatile ("syscall"            // note only 1 instruction in the template
    : "=a"(retval)                 // RAX gets the return value
    : "a"(SYS_write), "D"(STDOUT_FILENO), "S"(str_ptr), "d"(n_chars)
    : "memory", "rcx", "r11"       // syscall destroys RCX and R11
  );

where SYS_WRITE is defined in <sys/syscall.h> and STDOUT_FILENO in <stdio.h>. I am not going to explain all the details of extended inline assembly to you. Using inline assembly in general is usually a bad idea. Read the documentation if you are interested. (https://stackoverflow.com/tags/inline-assembly/info)

Fifth, you should avoid using inline assembly when you can. For example, to do system calls, use the syscall function from unistd.h:

syscall(SYS_write, STDOUT_FILENO, str_ptr, (size_t)n_chars);

This does the right thing. But it doesn't inline into your code, so use wrapper macros from MUSL for example if you want to really inline a syscall instead of calling a libc function.

Sixth, always check if the system call you want to call is already available in the C standard library. In this case, it is, so you should just write

write(STDOUT_FILENO, str_ptr, n_chars);

and avoid all of this altogether.

Seventh, if you prefer to use stdio, use fwrite instead:

fwrite(str_ptr, 1, n_chars, stdout);
like image 165
fuz Avatar answered Dec 15 '22 00:12

fuz


There are so many things wrong with your code (and so little reason to use inline asm for it) that it's not worth trying to actually correct all of them. Instead, use the write(2) system call the normal way, via the POSIX function / libc wrapper as documented in the man page, or use ISO C <stdio.h> fwrite(3).

#include <unistd.h>

static inline
void printStringWithLength(const char *str_ptr, int n_chars){
    write(1, str_ptr, n_chars);
    // TODO: check error return value
}

Why your code doesn't assemble:

In AT&T syntax, immediates always need a $ decorator. Your code will assemble if you use asm("int $0x80").

The assembler is complaining about 0x80, which is a memory reference to the absolute address 0x80. There is no form of int that takes the interrupt vector as anything other than an immediate. I'm not sure exactly why it complains about the size, since memory references don't have an implied size in AT&T syntax.


That will get it to assemble, at which point you'll get linker errors:

In function `printStringWithLength':
5 : <source>:5: undefined reference to `str_ptr'
6 : <source>:6: undefined reference to `n_chars'
collect2: error: ld returned 1 exit status

(from the Godbolt compiler explorer)

mov $str_ptr, %rcx

means to mov-immediate the address of the symbol str_ptr into %rcx. In AT&T syntax, you don't have to declare external symbols before using them, so unknown names are assumed to be global / static labels. If you had a global variable called str_ptr, that instruction would reference its address (which is a link-time constant, so can be used as an immediate).


As other have said, this is completely the wrong way to go about things with GNU C inline asm. See the inline-assembly tag wiki for more links to guides.

Also, you're using the wrong ABI. int $0x80 is the x86 32-bit system call ABI, so it doesn't work with 64-bit pointers. What are the calling conventions for UNIX & Linux system calls on x86-64

See also the x86 tag wiki.

like image 22
Peter Cordes Avatar answered Dec 14 '22 22:12

Peter Cordes