Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reverse engineer a compiler?

I have a compiler that compiles to undocumented bytecode for an undocumented VM. I'd like to be able to compile to the same VM, but I'm not sure how to go about doing this. How do I learn to do this? Has anyone published a log or journal of them doing the same thing?

EDIT: I neglected to mention that this is the RobotC 3.0 compiler for LEGO Mindstorms. Before anyone suggests something else, I know all about nXc and similar projects, and they aren't an option since I'm helping out a FIRST FTC robotics team, which is only allowed to use RobotC or LabView.

As for those of you who suggested that I target the language of the compiler, I'm not doing that yet because I hope that certain features of the hardware that aren't exposed by the compiler will be exposed by the firmware, and because I want to do more custom memory management than permitted by RobotC.

like image 241
Silas Snider Avatar asked Dec 09 '11 01:12

Silas Snider


1 Answers

My inclination would be to disassemble the compiler. If it's written in .NET (C#, VB.NET, etc.) or Java, there are decompilers that will give you something very close to the original source code (unless it's obfuscated). Even if it's C++ and you can only get assembly, the library calls might point you in the right direction.

  • x86 Disassembly/Disassemblers and Decompilers
  • ILSpy
  • JetBrains dotPeek
  • Java Decompiler

If the bytecode is small, another option is to start with a "Hello world" program, compile it, make a small change, compile that, then diff the two bytecode results.

If you can post some more information on the specific compiler/VM, perhaps someone else has experience with that.

Edit: Given that this is a commercial product, it's likely that reverse-engineering it through decompilation will violate its license agreement. RE'ing the bytecode might not (I'm no lawyer). It sounds like you're in a Catch-22, though. If you succeed, you're not using RobotC or LabView. If the contest only specifies that the code must run on the RobotC VM, it might be doable. Remember, though, that once you RE the bytecode you still have to write your own compiler before anyone can write any software. If you're doing this for fun, great; otherwise it might not be feasible.

I did find one interesting link based on a Master's Thesis: Software Reverse Engineering

like image 176
TrueWill Avatar answered Oct 24 '22 20:10

TrueWill