Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to go From Assembler instruction to C code

I have an assignment where, among other things, I need to look in an .asm file to find a certain instruction and "reverse engineer" (find out) what part of the C code causes it to be executed on an assembler level. (Example below the text)

What would be the fastest (easiest) way to do this. Or better to say, what other commands / instructions / labels that are around it in the .asm file should/could I pay attention to, that would guide me to the right C code?

I have next to zero experience with assembler code and it is tough to figure out what exact lines of C code cause a particular instruction to happen.

The architecture, if that makes any difference, is TriCore.

Example: I managed to figure out what C code causes an insert in the asm file, by following where the variables are used

 .L23:
    movh.a  a15,#@his(InsertStruct)
    ld.bu   d15,[a15]@los(InsertStruct)
    or  d15,#1
    st.b    [a15]@los(InsertStruct),d15
.L51:
    ld.bu   d15,[a15]@los(InsertStruct)
    insert  d15,d15,#0,#0,#1
    st.b    [a15]@los(InsertStruct),d15
.L17:
    mov d15,#-1

that led me to the following C code:

InsertStruct.SomeMember = 0x1u;

InsertStruct.SomeMember = 0x0u;
like image 697
vandelfi Avatar asked Dec 18 '17 10:12

vandelfi


People also ask

How do you convert assembly to machine code?

So all you have to do is identify each opcode in the assembly language, map it to the corresponding machine instruction, and write the machine instruction out to a file, along with its corresponding parameters (if any). You then repeat the process for each additional opcode in the source file.

Can you write assembly code in C?

We can write assembly program code inside c language program. In such case, all the assembly code must be placed inside asm{} block. Let's see a simple assembly program code to add two numbers in c program.

Is assembly same as C?

Assembler is a lower level programming language than C,so this makes it a good for programming directly to hardware. Hardware programming can be done directly in either language. The only things you can't do in C are accessing stack pointers and condition registers etc, of the CPU core itself.

Why is the code of entry () in assembly and not in C?

The entry point is in assembly because during the early boot phase there is NO facility to call C functions. Before we can call a C function the system should already have a valid stack.


1 Answers

The architecture is TriCore (if that makes any difference).

Of course. Assembler code is always architecture specific.

... what part of the C code causes it to be executed on an assembler level.

When using a highly optimizing compiler you nearly have no chance:

The Tasking compiler for TriCore for example sometimes even generates one fragment of assembly code (stored only once in memory!) for two different lines of C code in two different C files!

However the code in your example is not optimized (unless the structure you named InsertStruct is volatile).

In this case you could compile your code with debugging information switched on and extract the debugging information: From an ELF format file you can use tools like addr2line (freeware from the GNU compiler suite) to check which line of C code corresponds to an instruction at a certain address.

(Note: The addr2line tool is architecture independent as long as both architectures have same width (32-bit), the same endianness and both use the ELF file format; you could use addr2line for ARM to get the information from a TriCore file.)

If you really have to understand a fragment of assembler code I myself typically do the following:

I start a text editor and paste in the assembler code:

movh.a  a15,#@his(InsertStruct)
ld.bu   d15,[a15]@los(InsertStruct)
or      d15,#1
st.b    [a15]@los(InsertStruct),d15
...

Then I replace each instruction by the pseudo-code equivalent:

a15 =  ((((unsigned)&InsertStruct)>>16)<<16;
d15 =  *(unsigned char *)(a15 + (((unsigned)&InsertStruct)&0xFFFF));
d15 |= 1;
*(unsigned char *)(a15 + (((unsigned)&InsertStruct)&0xFFFF)) = d15;
...

In the next step I try to simplify this code:

a15 =  ((unsigned)&InsertStruct) & 0xFFFF0000;

Then:

d15 = *(unsigned char *)((((unsigned)&InsertStruct) & 0xFFFF0000) + (((unsigned)&InsertStruct)&0xFFFF));
...

Then:

d15 = *(unsigned char *)((unsigned)&InsertStruct);
...

Then:

d15 = *(unsigned char *)&InsertStruct;
...

In the end I try to replace jump instructions:

d15 = 0;
if(d14 == d13) goto L123;
d15 = 1;
L123:

... becomes:

d15 = 0;
if(d14 != d13) d15 = 1;

... and finally (maybe):

d15 = (d14 != d13);

In the end you have C code in the text editor.

Unfortunately this takes much time - but I don't know any faster method.

like image 165
Martin Rosenau Avatar answered Sep 26 '22 20:09

Martin Rosenau