Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strange behavior when trying to compile a source with tcc against gcc generated .o file

Tags:

c++

c

gcc

mingw

tcc

I am trying to compile a source with tcc (ver 0.9.26) against a gcc-generated .o file, but it has strange behavior. The gcc (ver 5.3.0)is from MinGW 64 bit.

More specifically, I have the following two files (te1.c te2.c). I did the following commands on windows7 box

c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O  elf64-x86-64 te1.o   #this is needed because te1.o from previous step is in COFF format, tcc only understand ELF format
c:\tcc> tcc te2.c te1.o
c:\tcc> te2.exe
567in dummy!!!

Note that it cut off 4 bytes from the string 1234567in dummy!!!\n. Wonder if what could have gone wrong.

Thanks Jin

========file te1.c===========

#include <stdio.h>

void dummy () {
    printf1("1234567in dummy!!!\n");
}

========file te2.c===========

#include <stdio.h>

void printf1(char *p) {
    printf("%s\n",p);
}
extern void dummy();
int main(int argc, char *argv[]) {
    dummy();
    return 0;
}

Update 1

Saw a difference in assembly between te1.o (te1.c compiled by tcc) and te1_gcc.o (te1.c compiled by gcc). In the tcc compiled, I saw lea -0x4(%rip),%rcx, on the gcc compiled, I saw lea 0x0(%rip),%rcx. Not sure why.

C:\temp>objdump -d te1.o

te1.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <dummy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 81 ec 20 00 00 00    sub    $0x20,%rsp
   b:   48 8d 0d fc ff ff ff    lea    -0x4(%rip),%rcx        # e <dummy+0xe>
  12:   e8 fc ff ff ff          callq  13 <dummy+0x13>
  17:   c9                      leaveq
  18:   c3                      retq
  19:   00 00                   add    %al,(%rax)
  1b:   00 01                   add    %al,(%rcx)
  1d:   04 02                   add    $0x2,%al
  1f:   05 04 03 01 50          add    $0x50010304,%eax

C:\temp>objdump -d te1_gcc.o

te1_gcc.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <dummy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 20             sub    $0x20,%rsp
   8:   48 8d 0d 00 00 00 00    lea    0x0(%rip),%rcx        # f <dummy+0xf>
   f:   e8 00 00 00 00          callq  14 <dummy+0x14>
  14:   90                      nop
  15:   48 83 c4 20             add    $0x20,%rsp
  19:   5d                      pop    %rbp
  1a:   c3                      retq
  1b:   90                      nop
  1c:   90                      nop
  1d:   90                      nop
  1e:   90                      nop
  1f:   90                      nop

Update2

Using a binary editor, I changed the machine code in te1.o (produced by gcc) and changed lea 0(%rip),%rcx to lea -0x4(%rip),%rcx and using the tcc to link it, the resulted exe works fine. More precisely, I did

c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O  elf64-x86-64 te1.o 
c:\tcc> use a binary editor to the change the bytes from (48 8d 0d 00 00 00 00) to (48 8d 0d fc ff ff ff)
c:\tcc> tcc te2.c te1.o
c:\tcc> te2
1234567in dummy!!!

Update 3

As requested, here is the output of objdump -r te1.o

C:\temp>gcc -c te1.c

C:\temp>objdump -r te1.o

te1.o:     file format pe-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
000000000000000b R_X86_64_PC32     .rdata
0000000000000010 R_X86_64_PC32     printf1


RELOCATION RECORDS FOR [.pdata]:
OFFSET           TYPE              VALUE
0000000000000000 rva32             .text
0000000000000004 rva32             .text
0000000000000008 rva32             .xdata



C:\temp>objdump -d te1.o

te1.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <dummy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 20             sub    $0x20,%rsp
   8:   48 8d 0d 00 00 00 00    lea    0x0(%rip),%rcx        # f <dummy+0xf>
   f:   e8 00 00 00 00          callq  14 <dummy+0x14>
  14:   90                      nop
  15:   48 83 c4 20             add    $0x20,%rsp
  19:   5d                      pop    %rbp
  1a:   c3                      retq
  1b:   90                      nop
  1c:   90                      nop
  1d:   90                      nop
  1e:   90                      nop
  1f:   90                      nop
like image 627
packetie Avatar asked Jul 12 '16 19:07

packetie


People also ask

What does GCC compile to?

GCC stands for GNU Compiler Collections which is used to compile mainly C and C++ language. It can also be used to compile Objective C and Objective C++.

What is linking in GCC?

Linking is performed when the input file are object files " .o " (instead of source file " . cpp " or " . c "). GCC uses a separate linker program (called ld.exe ) to perform the linking.

Can GCC run on Windows?

If you're a hacker running Windows, you don't need a proprietary application to compile code. With the Minimalist GNU for Windows (MinGW) project, you can download and install the GNU Compiler Collection (GCC) along with several other essential GNU components to enable GNU Autotools on your Windows computer.


2 Answers

Has nothing to do with tcc or calling conventions. It has to do with different linker conventions for elf64-x86-64 and pe-x86-64 formats.

With PE, the linker will subtract 4 implicitly to calculate the final offset.

With ELF, it does not do this. Because of this, 0 is the correct initial value for PE, and -4 is correct for ELF.

Unfortunately, objcopy does not convert this -> bug in objcopy.

like image 145
h1n1 Avatar answered Mar 01 '23 03:03

h1n1


add

extern void printf1(char *p);

to your te1.c file

Or: the compiler will assume argument 32 bit integer since there's no prototype, and pointers are 64-bit long.

Edit: this is still not working. I found out that the function never returns (since calling the printf1 a second time does nothing!). Seems that the 4 first bytes are consumed as return address or something like that. In gcc 32-bit mode it works fine. Sounds like a calling convention problem to me but still cannot figure it out. Another clue: calling printf from te1.c side (gcc, using tcc stdlib bindings) crashes with segv.

I disassembled the executable. First part is repeated call from tcc side

  40104f:       48 8d 05 b3 0f 00 00    lea    0xfb3(%rip),%rax        # 0x402009
  401056:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  40105a:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  40105e:       e8 9d ff ff ff          callq  0x401000
  401063:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401067:       e8 94 ff ff ff          callq  0x401000
  40106c:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401070:       e8 8b ff ff ff          callq  0x401000
  401075:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401079:       e8 82 ff ff ff          callq  0x401000
  40107e:       e8 0d 00 00 00          callq  0x401090
  401083:       b8 00 00 00 00          mov    $0x0,%eax
  401088:       e9 00 00 00 00          jmpq   0x40108d
  40108d:       c9                      leaveq
  40108e:       c3                      retq

Second part is repeated (6 times) call to the same function. As you can see the address is different (shifted by 4 bytes, like your data) !!! It kind of works just once because the 4 first instructions are the following:

 401000:       55                      push   %rbp
 401001:       48 89 e5                mov    %rsp,%rbp

so stack is destroyed if those are skipped!!

  40109f:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  4010a3:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010a7:       48 89 c1                mov    %rax,%rcx
  4010aa:       e8 55 ff ff ff          callq  0x401004
  4010af:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010b3:       48 89 c1                mov    %rax,%rcx
  4010b6:       e8 49 ff ff ff          callq  0x401004
  4010bb:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010bf:       48 89 c1                mov    %rax,%rcx
  4010c2:       e8 3d ff ff ff          callq  0x401004
  4010c7:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010cb:       48 89 c1                mov    %rax,%rcx
  4010ce:       e8 31 ff ff ff          callq  0x401004
  4010d3:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010d7:       48 89 c1                mov    %rax,%rcx
  4010da:       e8 25 ff ff ff          callq  0x401004
  4010df:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010e3:       48 89 c1                mov    %rax,%rcx
  4010e6:       e8 19 ff ff ff          callq  0x401004
  4010eb:       90                      nop
like image 44
Jean-François Fabre Avatar answered Mar 01 '23 02:03

Jean-François Fabre