Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you decompile a game down to its original source code?

I use Battlefield 4 just as an example, this can go for any game really. I've been wondering if something like this is possible:

Since BF4 is running client-side, that means you have all the code that makes up the game.

Would it 'technically' be possible to decompile the code and view its source?

All the way down to the core mechanics of the game? Or is there some sort of encryption protecting it?

I do realize that if you do successfully decompile something like that it would be a mess to deal with and not organized at all, but hey, it's still the source.

Just a little something I couldn't find an answer to anywhere else.

like image 347
novs12 Avatar asked Jan 02 '14 06:01

novs12


People also ask

Is it illegal to decompile game code?

Decompiling is absolutely LEGAL, regardless of what the shills say. At most, you can be sued for unauthorized activity relating to software unless you're redistributing it.

Can you extract source code from a game?

It is impossible to get the code of any game as the game is protected with security and encryption. Only an ethical hacker can do this.

Is decompiled code copyrighted?

Decompiling object code produces an approximation of the original source code. Merely making this rough copy would usually violate the copyright holder's exclusive rights, even if the person who decompiled the code only used it as a preliminary step in making another work.

Why is C++ so hard to decompile?

Because C++ compilers generally do not put any more information into the executable than they absolutely have to (especially not if they are compiling in release mode rather than a debug build), so the information you'd need to accurately decompile the program simply is not present in the executable.


3 Answers

No, because the mapping from instructions to code is not 1:1.

No, the compiler mangles the structure of your program, there is no other word for it, scheduling and the quest to reduce register pressure at certain points can mean instructions from the same operation can be up to 150,000 instructions away from each other ( IIRC this is the stock cap on GCC, you can change it with a -f option of course :P)

No, no, no.

The only promise the complication process offers is that the result will work as if it actually did what the programmers wrote. That's it.

Looking at Stuxnet was interesting (yes, not a game, I know) and practical because it was small, the parts of the program driving the scene graph alone will be huge and so well optimised. I'd also be shocked if they didn't use link time optimisation which removes even more of the structure.

this answer lacks a lot of detail, but that's because one explaining everything would be huge, you obviously have no idea how this works and it's good you want to learn.

http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf

I've linked this many-a-time, it's got some examples of code mapping to register instructions. That isn't optimised and they are small samples for a much simpler (sort of, depends how you look at it) machine, can you see how difficult even reversing these would be?

Lastly, debugging with -O3 is a joke, we have -Og now, where the compiler optimises but avoids structure-changing optimisations so debugging doesn't jump around so much, when you use -g the resulting object files are littered with the code they came from and stuff, above the instructions they generated. Fun facts!

like image 183
Alec Teal Avatar answered Oct 07 '22 03:10

Alec Teal


You can't recover original source code - the process of compilation is inherently lossy and some detail will inevitably be lost. How much is lost will depend on the source language, target language and choices made by developers.

Let's start with the easy cases - a high-level language compiled to its own bytecode. For example, Python to .pyc, C# to .NET IL (.dll), Java to .class/.dex. In each of these examples, the bytecode contains direct representations of high-level concepts in the language such as classes, methods, virtual function calls, class layouts, etc. Decompilers exist that will restore shockingly accurate source code from the compiled code.

Here's a brief example in Python. Original source:

class MyClass:
    def function(self, a, b):
        print("Hello, world:", a, b)

MyClass().function("test", 1234.5678)

Compiled with Python 3.6, and decompiled again using uncompyle6:

# uncompyle6 version 3.3.5
# Python bytecode 3.6 (3379)
# Decompiled from: Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28) 
# [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
# Embedded file name: /private/tmp/test.py
# Compiled at: 2019-12-23 16:34:01
# Size of source mod 2**32: 121 bytes


class MyClass:

    def function(self, a, b):
        print('Hello, world:', a, b)


MyClass().function('test', 1234.5678)
# okay decompiling __pycache__/test.cpython-36.pyc

Aside from some extra comments and spaces, the output is basically 1:1 with the original. Java and C# are similarly easy to decompile. Many games are written in Java (e.g. Android) and C# (e.g. Unity), and there are a lot of modders/hackers using decompilers to obtain usable source code for games written in these languages.

A developer can choose to defend against a decompiler by using obfuscation, where they deliberately mangle the compiled output in some way (e.g. renaming variables/functions/classes to gibberish names) to make this type of reverse engineering harder.


The harder cases is when you take code and compile it all the way down to machine code (code that runs directly on the CPU). Languages like Rust, Go, C++, Swift all compile straight to machine code by default. CPU instructions don't correspond 1-to-1 to concepts in the high-level language. Now, there are decompilers - the NSA's recently open-sourced Ghidra decompiler is one of the best out there - but they can only give you a very crude approximation of the original source, and most only decompile to C (not all the way to Rust/Go/C++/Swift/etc.). Here's a simple C++ program:

#include <iostream>

class MyClass {
public:
  void function(const char *a, const double b) {
    std::cout << "Hello, world: " << a << " " << b << std::endl;
  }
};

int main() {
  MyClass m;
  m.function("test", 1234.5678);
}

Here's how Ghidra 9.1 decompiles it:


// MyClass::function(char const*, double)

void __thiscall MyClass::function(MyClass *this,char *param_1,double param_2)

{
  char cVar1;
  basic_ostream *pbVar2;
  size_t sVar3;
  long *plVar4;
  long *plVar5;
  undefined local_20 [8];
  
  pbVar2 = std::__1::__put_character_sequence<char,std--__1--char_traits<char>>
                     ((basic_ostream *)__ZNSt3__14coutE,"Hello, world: ",0xe);
  sVar3 = __stubs::_strlen(param_1);
  pbVar2 = std::__1::__put_character_sequence<char,std--__1--char_traits<char>>
                     (pbVar2,param_1,sVar3);
  pbVar2 = std::__1::__put_character_sequence<char,std--__1--char_traits<char>>(pbVar2," ",1);
  plVar4 = (long *)__stubs::__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEElsEd(param_2,pbVar2);
  __stubs::__ZNKSt3__18ios_base6getlocEv(local_20,*(long *)(*plVar4 + -0x18) + (long)plVar4);
  plVar5 = (long *)__stubs::__ZNKSt3__16locale9use_facetERNS0_2idE(local_20,__ZNSt3__15ctypeIcE2idE)
  ;
  cVar1 = (**(code **)(*plVar5 + 0x38))(plVar5,10);
  __stubs::__ZNSt3__16localeD1Ev(local_20);
  __stubs::__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE3putEc(plVar4,(ulong)(uint)(int)cVar1);
  __stubs::__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE5flushEv(plVar4);
  return;
}


undefined8 entry(void)

{
  MyClass local_10 [8];
  
  MyClass::function(local_10,"test",1234.56780000);
  return 0;
}

An experienced reverse engineer can make sense of this - but it's a lot less nice.

So there you have it. If you're reverse engineering a program compiled to native CPU code, you can get source but it's going to be pretty rough. If you're reverse engineering a program compiled to some intermediate bytecode, you'll have a better time. In all cases, you can't get exactly the original source code, but you might be able to get pretty close.

like image 12
nneonneo Avatar answered Oct 07 '22 03:10

nneonneo


The other answers aren't accurate.

There are several reverse engineering projects out there which perfectly reconstruct 1:1 accurate C code and compile to the exact same bytes given the original compiler. Please see https://github.com/pret/pokeemerald . Of course you lose names and comments but it is not accurate to say no to this question here. It's perfectly possible to construct recompilable matching C code (purely in this narrow case, anyway.), it's just really tedious and a question of permutation through sets of C fast enough to find a matching member.

The actual answer? Yes. Will you be able to reasonably find 1:1 matching members for every function? Probably not.

like image 3
Revo Avatar answered Oct 07 '22 03:10

Revo