Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it so easy to decompile .NET IL code?

Why is it so easy to decompile .NET IL-code into source code, compared to decompiling native x86 binaries? (Reflector produces quite good source code most of the time, while decompiling the output of a C++ compiler is almost impossible.)

Is it because IL contains a lot of meta data? Or is it because IL is a higher abstraction than x86 instructions? I did some research and found the following two usefull articles, but neither of them answers my question.

  • MSIL Decompiler Theory
  • C Decompiler - Quick primer
like image 282
compie Avatar asked Mar 22 '09 18:03

compie


People also ask

How do I stop .NET DLL decompiled?

After you click on Done, go to Protect tab and click on Protect button. You can find the protected DLL or EXE in the output directory selected. Test the EXE or DLL generated by ConfusedEx and check if it is working as usual. Now try to decompile it with a decompiler.

Can you decompile C# code?

You can now use Visual Studio to decompile managed code even if you don't have the symbols, allowing you to look at code, inspect variables and set breakpoints.

Is decompiling code possible?

It is also not possible to decompile all programs. Furthermore, it is not easy to separate data and code because both are represented similarly in most current computer systems. A type of reverse engineering, a decompiler performs the opposite operations of a compiler.

What is .NET decompiler?

. NET decompiler is used to decompile the Libraries, Executables, and more.


2 Answers

I think you've got the most important bits already.

  • As you say, there's more metadata available. I don't know the details of what is emitted by a C or C++ compiler, but I suspect far more names and similar information are included in IL. Just look at what the decompiler knows about what's in a particular stack frame, for example - as far as the x86 is concerned, you only know how the stack is used ; in IL you know what the contents of the stack represent (or at least, the type - not the semantic meaning!)
  • Again, as you've already mentioned, IL is a higher level abstraction than x86. x86 has no idea what a method or function call is, or an event, or a property etc. IL has all that information still within it.
  • Typically C and C++ compilers optimise much more heavily than (say) the C# compiler. This is because the C# compiler assumes that most of the optimisation can still be performed later - by the JIT. In some ways it makes sense for the C# compiler not to try to do much optimisation, as there are various bits of information which are available to the JIT but not the C# compiler. Optimised code is harder to decompile, because it's further away from being a natural representation of the original source code.
  • IL was designed to be JIT-compiled; x86 was designed to be executed natively (admittedly via micro-code). The information the JIT compiler needs is similar to that that a decompiler would want, so a decompiler has an easier time with IL. In some ways this is really just a restatement of the second point.
like image 82
Jon Skeet Avatar answered Oct 11 '22 10:10

Jon Skeet


There are a number of things that make reverse engineering il fairly easy.

  • Type information. This is massive. In x86 assembler, you have to infer the types of variables based on how they are used.

  • structure. Information on the structure of the application is more available in il disassemblies. This, combined with type information, gives you an amazing amount of data. You're working at a pretty high level at this point (relative to x86 assembler). In native assembler, you have to infer the structure layouts (and even the fact that they are structures) based on how the data is used. Not impossible, but much more time consuming.

  • names. Knowing the names of things can be useful.

These things, combined, means you have quite a lot of data about the executable. Il is basically working at a level much closer to the source than a compiler of native code would be. The higher level the bytecode works at, the easier reverse engineering is, generally speaking.

like image 39
Brian Mitchell Avatar answered Oct 11 '22 10:10

Brian Mitchell