Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Challenge: maximize cost of obfuscation's reverse engineering

Disclaimer: Similar questions has been asked a number of times on SO, however this question is much more specific, and has not been adequately addressed so far.

We're developing a new packaged software, which, for business security reasons, must run on our customer's server, in PHP. The software is sold with a per-user end-license; price range is within $20-80 per user, target market is small (and web-savy) consultancies, and IT agencies.

To discourage piracy (eg. removing the user-license enforcement), we'd like to maximize the protection of the PHP code in any means technologically available, which does not inconvenience the user.

Let's break this down:

  • does not inconvenience the user: no additional server-side installs (no zend decoder, or other binaries). Has to run on a plain-vanilla shared PHP host out-of-the-box.

  • Maximize the protection: breaking the protection has to outweigh the cost of buying an additional license. That is, it has to take at least 3-5 working days for a professional hacker to remove the user license protection.

  • Any means technologically available: might call home, might use high-end crypto, might implement a c64 emulator.

To pro-actively address the so far highest-voted non-solutions:

  • NOT looking for perfect obfuscation, just extremely hard ones (defined as: have to take at least 3-5 working days to decrypt), OR other anti-piracy methods

  • NOT looking for "black-box" software packages, which I don't know how they work, and can't determine whether it fits our purpose; looking for algorithmic ,and out-of-the-box ideas.

  • NOT looking for license/law-side protection, we already have that covered.

  • We DO know, that given enough time, and focus, all obfuscation will be hacked sooner or later; we merely want this not to be the economical solution.

Given the above constraints, what methods, or ideas would you use to maximize anti-piracy measures?

Bounty-hunt: point goes for the hardest algorithmic method to reverse-engineer the code, given the constraints above.

Update / Bounty-hunt: I've accepted Ira Baxter's answer, mostly because the rest failed to answer the core question, and attempted to question the underlying assumptions (business, closed source, yadda yadda). Thanks all!

like image 426
Silver Dragon Avatar asked Feb 21 '11 18:02

Silver Dragon


1 Answers

I think what you want to do is to transform the code algorithmically, to obfuscate not only what is executed, but also to obfuscate the data structures. We assume we start with a clean version of the program, produced by the developer. He always works wih the clean version. Obfuscation produces the to-ship version. Good obfuscation will produce a to-ship version with exactly the same functionality as the original, so no further testing is (arguably) needed.

For control flow scrambling, the idea is to take the nicely written code you have at the start, and push it through transformations that make static (and human) analysis of the decisions that control the flow difficult by multiplying the set of assumptions that have to analyzed. For instance, if you have two pointers, and store a value through one, can it affect the value seen by the other? Depending on whether the pointers are aliased on not, you can get two different answers. Now take N pointers, each of which may be aliased; you get 2^N possible aliasing relations. If the reader doesn't know the exact combination, he won't be able to determine if a decision might be true, false or conditional. Of course, the tool that generates this produces conditionals whose outcome it knows, because it designs (generates) the pointer rat's nest to produce a specific outcome.

See Code Obfuscation Literature Survey (not my paper), which discusses a variety of control flow and data flow obfuscation. This is likely not the most recent summary of what is possible, but its pretty instructive. You should note doing this kind of obfuscation has some impact on execution time.

What the papers on this topic make clear is that control and data flow obfuscated programs are extremely hard for static analyzers to "understand"; the papers provide/reference demonstrations of the algorithmic complexity of processing such obfuscated programs.

Now, you might argue that people aren't static analyzers and therefore don't suffer the same limitations. You might be right; Roger Penrose famously argues that people do not have the same constraints as Turing machines; the argument isn't settled by a long shot. But the entire foundation of encryption/hashing technology is built on essentially the same kind of computational complexity arguments. And to date, nobody has proven smart enough to crack these technologies in ways that can be used in daily life by theives (good thing, or your bank accounts would be empty).

To do this to a PHP program, you need tools that can parse the PHP code, and carry out such transformations. Our DMS Software Reengineering Toolkit has robust PHP parsers, and can apply very complex transformations to code. To do this really well, you want to apply the transformations globally across all your code, not just on a file-by-file basis. We don't have this kind of obfuscation transformation implemented on PHP, but if you really wanted to do it, this would be the way. We have applied complex transformations to PHP programs for other commercial products that we sell.

When you are all done, ideally you'd compile this result to machine code, say using the HipHop compiler. (Just compiling would defeat some folks, but not the serious software engineers).

EDIT: Obfuscation != AntiPiracy is a theme in other answers. So how does obfuscation help?

First you need to deal with the anti-piracy issue. The obvious things to do are:

  • Add copyright comments to each file. These serve as warnings to theives. Not good ones.
  • Add copyright strings in various places and print them out occasionally; these will end up in memory and play a roleif a pirate steals the code; he stole this string, too.
  • Add a string to your application saying, "licensed to ". This makes your customer unenthusiastic about letting it be stolen.
  • Add a check to your application that it is running on the intended customer's machine. (Since your app is intended to be very cheap, you'll probably need to automate a registration process)
  • Have the application phone home with its machine ID occasionally.

Now, these steps prevent someone (legally and technically) from stealing your code.
If this is all you have, an unfazed pirate will simply remove the technical checks and its stolen.

It is very hard to prevent somebody from copying the bit stream that makes up your product; computers are far too good at copying. So your goal is to arrange for it to be hard for him to derive value if he does, and that's where obfuscation comes in.

If the code is sufficiently obfuscated, he will have a difficult time locating the license check and phone home mechansisms to disable them. (I suggest several checks, none of them always called, to make it hard for the theif to tell when he is successful.). The obfuscation, well done, should protect the printing of the original owner's name, which means the original owner will have some interest in prevent it from being stolen as you'll name him along with pirate in any lawsuit.

If they defeat the licenses, copyright printing, and phone-home mechanisms, and simply want to run it in the back room without telling you, you might be stuck. (For $80.00, I can't imagine why they'd go to all this trouble just for this effect). But many thieves want to modify the software to "improve" it, especially if they want your market. Serious obfuscation will prevent them for doing this; it will even make it hard for them to add thier own license controls. That limits the value pretty severely.

They may simply steal it and release it to world for free; your hope here is the applicaton is hard to crack. If they succeed, your only good defense is a continuing stream of upgrades that licensed owners get.

Obfuscation is a key to successful piracy defense, IMHO.

like image 145
Ira Baxter Avatar answered Nov 02 '22 18:11

Ira Baxter