Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cryptanalysis: XOR of two plaintext files

I have a file which contains the result of two XORed plaintext files. How do I attack this file in order to decrypt either of the plaintext files? I have searched quite a bit, but could not find any answers. Thanks!

EDIT:

Well, I also have the two ciphertexts which i XORed to get the XOR of the two plaintexts. The reason I ask this question, is because, according to Bruce Schneier, pg. 198, Applied Cryptography, 1996 "...she can XOR them together and get two plaintext messages XORed with each other. This is easy to break, and then she can XOR one of the plaintexts with the ciphertext to get the keystream." (This is in relation to a simple stream cipher) But beyond that he provided no explanation. Which is why I asked here. Forgive my ignorance.

Also, the algorithm used is a simple one, and a symmetric key is used whose length is 3.

FURTHER EDIT:

I forgot to add: Im assuming that a simple stream cipher was used for encryption.

like image 545
OckhamsRazor Avatar asked Apr 14 '11 22:04

OckhamsRazor


3 Answers

I'm no cryptanalyst, but if you know something about the characteristics of the files you might have a chance.

For example, lets assume that you know that both original plaintexts:

  • contain plain ASCII English text
  • are articles about sports (or whatever)

Given those 2 pieces of information, one approach you might take is to scan through the ciphertext 'decrypting' using words that you might expect to be in them, such as "football", "player", "score", etc. Perform the decryption using "football" at position 0 of the ciphertext, then at position 1, then 2 and so on.

If the result of decrypting a sequence of bytes appears to be a word or word fragment, then you have a good chance that you've found plaintext from both files. That may give you a clue as to some surrounding plaintext, and you can see if that results in a sensible decryption. And so on.

Repeat this process with other words/phrases/fragments that you might expect to be in the plaintexts.


In response to your question's edit: what Schneier is talking about is that if someone has 2 ciphertexts that have been XOR encrypted using the same key, XORing those ciphertexts will 'cancel out' the keystream, since:

(A ^ k) - ciphertext of A
(B ^ k) - ciphertext of B

(A ^ k) ^ (B ^ k) - the two ciphertexts XOR'ed together which simplifies to:

A ^ B ^ k ^ k - which continues to simplify to
A ^ B ^ 0
A ^ B

So now, the attacker has a new ciphertext that's composed only of the two plaintexts. If the attacker knows one of the plaintexts (say the attacker has legitimate access to A, but not B), that can be used to recover the other plaintext:

A ^ (A ^ B)
(A ^ A) ^ B
0 ^ B
B

Now the attacker has the plaintext for B.

It's actually worse than this - if the attacker has A and the ciphertext for A then he can recover the keystream already.

But, the guessing approach I gave above is a variant of the above with the attacker using (hopefully good) guesses instead of a known plaintext. Obviously it's not as easy, but it's the same concept, and it can be done without starting with known plaintext. Now the attacker has a ciphertext that 'tells' him when he's correctly guessed some plaintext (because it results in other plaintext from the decryption). So even if the key used in the original XOR operation is random gibberish, an attacker can use the file that has that random gibberish 'removed' to gain information when he's making educated guesses.

like image 141
Michael Burr Avatar answered Nov 10 '22 16:11

Michael Burr


You need to take advantage of the fact that both files are plain text. There is a lot of implications which can be derived from that fact. Assuming that both texts are English texts, you can use fact that some letters are much more popular than the others. See this article.

Another hint is to note the structure of correct English text. For example, every time one statements ends, and next begins you there is a (dot, space, capital letter) sequence.

Note that in ASCII code, space is binary "0010 0000" and changing that bit in a letter will change the letter case (lower to upper and vice versa). There will be a lot of XORing using space, if both files are plain text, right? Analyse printable characters table on this page.

Also, at the end you can use spell checker.

I know I didn't provide a solution for your question. I just gave you some hints. Have fun, and please share your findings. It's really an interesting task.

like image 42
Michał Šrajer Avatar answered Nov 10 '22 16:11

Michał Šrajer


That is interesting. The Schneier book does indeed say that it is easy to break this. And then he kind of leaves it hanging at that. I guess you have to leave some exercises up to the reader!

There is an article by Dawson and Nielson that apparently describes an automated process for this task for text files. It's a bit on the $$ side to buy the single article. However, a second paper titled A Natural Language Approach to Automated Cryptanalysis of Two-time Pads references the Dawson and Nielsen work and describes some assumptions they made (primarily that the text was limited to 27 characters). But this second paper appears to be freely available and describes their own system. I don't know for sure that it is free, but it is openly available on a Johns Hopkins University server.

That paper is about 10 pages long and looks interesting. I don't have time to read it at the moment but may later. I find it interesting (and telling) that it takes a 10 page paper to describe a task that another cryptographer describes as "easy".

like image 35
Mark Wilkins Avatar answered Nov 10 '22 16:11

Mark Wilkins