Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find a checksum of the same checksum? (job-interview question)

Tags:

Devise a simple algorithm which creates a file which contains nothing but its own checksum.

Let's say it is CRC-32, so this file must be 4 bytes long.

like image 819
psihodelia Avatar asked Mar 31 '10 15:03

psihodelia


People also ask

What is checksum with example?

A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.

How is checksum generated?

To produce a checksum, you run a program that puts that file through an algorithm. Typical algorithms used for this include MD5, SHA-1, SHA-256, and SHA-512. The algorithm uses a cryptographic hash function that takes an input and produces a string (a sequence of numbers and letters) of a fixed length.

How does file checksum work?

A checksum is the outcome of running an algorithm, called a cryptographic hash function, on a piece of data, usually a single file. Comparing the checksum that you generate from your version of the file, with the one provided by the source of the file, helps ensure that your copy of the file is genuine and error free.

What is checksum verification?

A checksum is a value used to verify the integrity of a file or a data transfer. In other words, it is a sum that checks the validity of data. Checksums are typically used to compare two sets of data to make sure they are the same.


2 Answers

There might be some smart mathematical way of finding it out (or proving that none exists), if you know how the algorithm works.

But since I'm lazy and CRC32 has only 2^32 values, I would brute force it. While waiting for the algorithm to go through all 2^32 values, I would use Google and Stack Overflow to find whether somebody has a solution to it.

In case of SHA-1, MD5 and other more-or-less cryptographically secure algorithms, I would get intimidated by the mathematicians who designed those algorithms and just give up.

EDIT 1: Brute forcing... This far I've found one; CC4FBB6A in big-endian encoding. There might still be more. I'm checking 4 different encodings: ASCII uppercase and lowercase, and binary big-endian and little-endian.

EDIT 2: Brute force done. Here are the results:

CC4FBB6A (big-endian)
FFFFFFFF (big-endian & little-endian)
32F3B737 (uppercase ASCII)

The code is here. On my overclocked C2Q6600 that takes about 1.5 hours to run. Now that program is single-threaded, but it would be easy to make it multi-threaded, which would give a nice linear scalability.

like image 90
Esko Luontola Avatar answered Oct 27 '22 12:10

Esko Luontola


Aside from Jerry Coffin and Esko Luontola's good answers to an unusual problem, I'd like to add:

Mathematically, we're looking for X such that F(X) = X, where F is the checksum function, and X is the data itself. Since the checksum's output is of fixed size, and the input we are looking for is of the same size, there is no guarantee that such an X even exists! It could very well be that every input value of the fixed size is correlated with a different value of that size.

EDIT: Your question didn't specify the exact way the checksum is supposed to be formatted within the file, so I assumed you mean the byte-representation of the checksum. When strings and encodings and formatted-strings come to play, things become more complex.

like image 35
M.A. Hanin Avatar answered Oct 27 '22 12:10

M.A. Hanin