Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reverse Engineer Auto-Generated C?

How easy is it to reverse engineer an auto-generated C code? I am working on a Python project and as part of my work, am using Cython to compile the code for speedup purposes.

This does help in terms of speed, yet, I am concerned that where I work, some people would try to "peek" into the code and figure out what it does.

Cython code is basically an auto-generated C. Is it very hard to reverse engineer it?

Are there any recommendations that would make the code safer and reverse-engineering harder to do? (I assume that with enough effort, everything can be reversed engineered).

like image 792
user3262424 Avatar asked Mar 31 '11 23:03

user3262424


People also ask

Can you reverse engineer C code?

The Reverser allows you to reverse engineer compilable C code to a model, which you may want to do for the following reasons: To view the structure of the C code in Modeler. To develop the C code further in Modeler before regenerating the code. To move the C code to another platform, such as C++ or Java.

Is reverse engineering always illegal?

In the United States, even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful if it has been legitimately obtained. Reverse engineering of computer software often falls under both contract law as a breach of contract as well as any other relevant laws.


2 Answers

Okay -- to attempt to answer your question more directly: most auto-generated C code is fairly ugly, so somebody would need to be fairly motivated to reverse engineer it. At the same time, I don't believe I've never looked at what Cython generates, so I'm not sure how it looks.

In addition, a lot of auto-generated code is done in the form of things like state machine tables, that most programmers find fairly difficult to follow even at best. The tendency (in many cases) is to have a generic framework, with tables of data that the framework more or less "interprets" at run-time. This isn't necessarily impossible to follow, but it's enough different from most typical code that most people will give up on it fairly quickly (and if they do much, they'll typically waste a lot of time looking at the framework instead of the data, which is what really matters in cases like this).

I'll repeat, however, that I'm pretty sure I haven't looked at what Cython produces, so I can't say much about it with any real certainty.

There are (or at least used to be) commercial obfuscators intended to make C source code difficult to understand. I suspect the availability of Perl has taken a lot of the market share from them, but if you look you may still be able to find one and use it.

Absent that, it's not terribly difficult to write an obfuscator of your own, but the degree of effectiveness will probably vary with the amount of effort you're willing to put into it. Just systematically renaming any meaningful variable names into things like _ and __ can do quite a bit (e.g., profit = sales - costs; is a lot more meaningful than _ = _I_ - _i_;). Depending on the machine generated code in question, however, this may not really accomplish much -- obfuscating a generic framework may not make much difference in understanding what your code does -- and if they figure out the procedure you're following, they may be able to simply replicate the correct framework code and transplant the pieces specific to your program into the un-obfuscated framework.

like image 64
Jerry Coffin Avatar answered Sep 27 '22 17:09

Jerry Coffin


You should really take a look at the code that Cython produces. To help with debugging, for example, it copies the complete Python source code into the generated file, marking each source line before generating C code for it. That makes it very easy to find the code section that you are interested in.

A very nice feature is that you can compile your code with the "-a" (annotate) option, and it will spit out an HTML file next to the C file that contains the annotated Python code. When you click on a line, you'll see the C code for that line. As a bonus, it marks lines that do a lot of Python processing in dark yellow, so that you get a simple indicator where to look for potential optimisations.

There's also special gdb support in Cython now, so you can do Cython source level debugging etc.

like image 34
Stefan Behnel Avatar answered Sep 27 '22 18:09

Stefan Behnel