Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one go about reverse engineering an algorithm?

I'm wondering how does one go about reversing an algorithm such as one for storing logins or pin codes.

Lets say I have an amount of data where:

7262627 -> ? -> 8172

5353773 -> ? -> 1132

etc. This is just an example. Or say a hex string that is tansformed into another.

&h8712 -> &h1283 or something like that.

How do I go about starting to figure out what that algorithm is? Where does one start?

Would you start trying different shifts, xors and hope something stands out? I'm sure there's a better way as this seems like stabbing in the dark.

Is it even practically possible to reverse engineer this kind of algorithm?

Sorry if this is a stupid question. Thanks for your help / pointers.

like image 596
Evelyn Avatar asked Nov 13 '10 01:11

Evelyn


3 Answers

There are a few things people try:

  • Get the source code, or disassemble an executable.
  • Guess, based on the hash functions other people use. For example, a hash consisting of 32 hex digits might well be one or more repetitions of MD5, and if you can get a single input/output pair then it is quite easy to confirm or refute this (although see "salt", below).
  • Statistically analyze a large number of pairs of inputs and outputs, looking for any kind of pattern or correlations, and relate those correlations to properties of known hash functions and/or possible operations that the designer of the system might have used. This is beyond the scope of a single technique, and into the realms of general cryptanalysis.
  • Ask the author. Secure systems don't usually rely on the secrecy of the hash algorithms they use (and don't usually stay secure long if they do). The examples you give are quite small, though, and secure hashing of passwords would always involve a salt, which yours apparently don't. So we might not be talking about the kind of system where the author is confident to do that.

In the case of a hash where the output is only 4 decimal digits, you can attack it simply by building a table of every possible 7 digit input, together with its hashed value. You can then reverse the table and you have your (one-to-many) de-hashing operation. You never need to know how the hash is actually calculated. How do you get the input/output pairs? Well, if an outsider can somehow specify a value to be hashed, and see the result, then you have what's called a "chosen plaintext", and an attack relying on that is a "chosen plaintext attack". So a 7 digit -> 4 digit hash would be very weak indeed if it was used in a way which allowed chosen plaintext attacks to generate a lot of input/output pairs. I realise that's just one example, but it's also just one example of a technique to reverse it.

Note that reverse engineering the hash, and actually reversing it, are two different things. You could figure out that I'm using SHA-256, but that wouldn't help you reverse it (i.e., given an output, work out the input value). Nobody knows how to fully reverse SHA-256, although of course there are always rainbow tables (see "salt", above) <conspiracy>At least nobody admits they do, so it's no use to you or me.</conspiracy>

like image 191
Steve Jessop Avatar answered Nov 15 '22 10:11

Steve Jessop


Probably, you can't. Suppose the transformation function is known, something like

function hash(text):
    return sha1("secret salt"+text)

But the "secret salt" is not known, and is cryptographically strong (a very large, random integer). You could never brute force the secret salt from even a very large number of plain-text, crypttext pairs.

In fact, if the precise hash function used was known to be one of two equally strong functions, you could never even get a good guess between which one was being used.

like image 3
SingleNegationElimination Avatar answered Nov 15 '22 11:11

SingleNegationElimination


Stabbing in the dark will drive you to insanity. There are some algorithms that, given current understanding, you couldn't hope to deduce the inner workings of between now and the [predicted] end of the universe without knowing the exact details (potentially including private keys or internal state). Of course, some of these algorithms are the foundations of modern cryptography.

If you know in advance that there's a pattern to be discovered though, there are sometimes ways of approaching this. For instance, if the dataset contains several input values that differ by 1, compare the corresponding output values:

7262627 -> 8172
7262628 -> 819
7262629 -> 1732
...
7262631 -> 3558

Here it's fairly clear (given a few minutes and a calculator) that when the input increases by 1, the output increases by 913 modulo 8266 (i.e. a simple linear congruential generator).

Differential cryptanalysis is a relatively modern technique used to analyse the strength of cryptographic block ciphers, relying on a similar but more complex idea for where the cipher algorithm is known, but it's assumed the private key isn't. Input blocks differing from each other by a single bit are considered and the effect of that bit is traced through the cipher to deduce how likely each output bit is to "flip" as a result.

Other ways of approaching this kind of problem would be to look at the extremes (maximum, minimum values), distribution (leading to frequency analysis), direction (do the numbers always increase? decrease?) and (if this is allowed) consider the context in which the data sets were found. For instance, some types of PIN codes always contain a repeated digit to make them easier to remember (I'm not saying a PIN code can necessarily be deduced from anything else - just that a repeated digit is one less digit to worry about!).

like image 2
SimonJ Avatar answered Nov 15 '22 11:11

SimonJ