Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert ASM to C (not reverse engineer)

I googled and I see a surprising amount of flippant responses basically laughing at the asker for asking such a question.

Microchip provides some source code for free (I don't want to post it here in case that's a no-no. Basically, google AN937, click the first link and there's a link for "source code" and its a zipped file). Its in ASM and when I look at it I start to go cross-eyed. I'd like to convert it to something resembling a c type language so that I can follow along. Because lines such as:

GLOBAL  _24_bit_sub movf    BARGB2,w subwf   AARGB2,f 

are probably very simple but they mean nothing to me.

There may be some automated ASM to C translator out there but all I can find are people saying its impossible. Frankly, its impossible for it to be impossible. Both languages have structure and that structure surely can be translated.

like image 865
Steven Avatar asked Sep 04 '09 01:09

Steven


People also ask

Can I convert assembly code to C?

You can't deterministically convert assembly code to C. Interrupts, self modifying code, and other low level things have no representation other than inline assembly in C. There is only some extent to which an assembly to C process can work.

What is asm () in C?

The asm keyword allows you to embed assembler instructions within C code. GCC provides two forms of inline asm statements. A basic asm statement is one with no operands (see Basic Asm), while an extended asm statement (see Extended Asm) includes one or more operands.

Do you need to know assembly for reverse engineering?

To that extend, we strongly believe that, in order to become a proficient software reverse engineer, one has to learn about assembly programming. Not because writing assembly code has any importance or because it's cool. We think learning how to write assembly is important because reading code is already a hard thing.


1 Answers

You can absolutely make a c program from assembler. The problem is it may not look like what you are thinking, or maybe it will. My PIC is rusty but using another assembler, say you had

add r1,r2 

In C lets say that becomes

r1 = r1 + r2; 

Possibly more readable. You lose any sense of variable names perhaps as values are jumping from memory to registers and back and the registers are being reused. If you are talking about the older pics that had what two registers an accumulator and another, well it actually might be easier because variables were in memory for the most part, you look at the address, something like

q = mem[0x12]; e = q; q = mem[0x13]; e = e + q; mem[0x12] = e; 

Long and drawn out but it is clear that mem[0x12] = mem[0x12] + mem[0x13];

These memory locations are likely variables that will not jump around like compiled C code for a processor with a bunch of registers. The pic might make it easier to figure out the variables and then do a search and replace to name them across the file.

What you are looking for is called a static binary translation, not necessarily a translation from one binary to another (one processor to another) but in this case a translation from pic binary to C. Ideally you would want to take the assembler given in the app note and assemble it to a binary using the microchip tools, then do the translation. You can do dynamic binary translation as well but you are even less likely to find one of those and it doesnt normally result in C but one binary to another. Ever wonder how those $15 joysticks at wal-mart with pac-man and galaga work? The rom from the arcade was converted using static binary translation, optimized and cleaned up and the C or whatever intermediate language compiled for the new target processor in the handheld box. I imagine not all of them were done this way but am pretty sure some were.

The million dollar question, can you find a static binary translator for a pic? Who knows, you probably have to write one yourself. And guess what that means, you write a disassembler, and instead of disassembling to an instruction in the native assembler syntax like add r0,r1 you have your disassembler print out r0=r0+r1; By the time you finish this disassembler though you will know the pic assembly language so well that you wont need the asm to C translator. You have a chicken and egg problem.

like image 191
old_timer Avatar answered Sep 30 '22 11:09

old_timer