Devise a simple algorithm which creates a file which contains nothing but its own checksum.
Let's say it is CRC-32, so this file must be 4 bytes long.
A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.
To produce a checksum, you run a program that puts that file through an algorithm. Typical algorithms used for this include MD5, SHA-1, SHA-256, and SHA-512. The algorithm uses a cryptographic hash function that takes an input and produces a string (a sequence of numbers and letters) of a fixed length.
A checksum is the outcome of running an algorithm, called a cryptographic hash function, on a piece of data, usually a single file. Comparing the checksum that you generate from your version of the file, with the one provided by the source of the file, helps ensure that your copy of the file is genuine and error free.
A checksum is a value used to verify the integrity of a file or a data transfer. In other words, it is a sum that checks the validity of data. Checksums are typically used to compare two sets of data to make sure they are the same.
There might be some smart mathematical way of finding it out (or proving that none exists), if you know how the algorithm works.
But since I'm lazy and CRC32 has only 2^32 values, I would brute force it. While waiting for the algorithm to go through all 2^32 values, I would use Google and Stack Overflow to find whether somebody has a solution to it.
In case of SHA-1, MD5 and other more-or-less cryptographically secure algorithms, I would get intimidated by the mathematicians who designed those algorithms and just give up.
EDIT 1: Brute forcing... This far I've found one; CC4FBB6A in big-endian encoding. There might still be more. I'm checking 4 different encodings: ASCII uppercase and lowercase, and binary big-endian and little-endian.
EDIT 2: Brute force done. Here are the results:
CC4FBB6A (big-endian)
FFFFFFFF (big-endian & little-endian)
32F3B737 (uppercase ASCII)
The code is here. On my overclocked C2Q6600 that takes about 1.5 hours to run. Now that program is single-threaded, but it would be easy to make it multi-threaded, which would give a nice linear scalability.
Aside from Jerry Coffin and Esko Luontola's good answers to an unusual problem, I'd like to add:
Mathematically, we're looking for X such that F(X) = X, where F is the checksum function, and X is the data itself. Since the checksum's output is of fixed size, and the input we are looking for is of the same size, there is no guarantee that such an X even exists! It could very well be that every input value of the fixed size is correlated with a different value of that size.
EDIT: Your question didn't specify the exact way the checksum is supposed to be formatted within the file, so I assumed you mean the byte-representation of the checksum. When strings and encodings and formatted-strings come to play, things become more complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With