Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chicken/Egg problem: Hash of file (including hash) inside file! Possible?

Thing is I have a file that has room for metadata. I want to store a hash for integrity verification in it. Problem is, once I store the hash, the file and the hash along with it changes.

I perfectly understand that this is by definition impossible with one way cryptographic hash methods like md5/sha.

I am also aware of the possibility of containers that store verification data separated from the content as zip & co do.

I am also aware of the possibility to calculate the hash separately and send it along with the file or to append it at the end or somewhere where the client, when calculating the hash, ignores it.

This is not what I want.

I want to know whether there is an algorithm where its possible to get the resulting hash from data where the very result of the hash itself is included.

It doesn't need to be cryptographic or fullfill a lot of criterias. It can also be based on some heuristics that after a realistic amount of time deliver the desired result.

I am really not so into mathematics, but couldn't there be some really advanced exponential modulo polynom cyclic back-reference devision stuff that makes this possible?

And if not, whats (if there is) the proof against it?

The reason why i need tis is because i want (ultimately) to store a hash along with MP4 files. Its complicated, but other solutions are not easy to implement as the file walks through a badly desigend production pipeline...

like image 758
The Surrican Avatar asked Dec 13 '10 19:12

The Surrican


3 Answers

It's possible to do this with a CRC, in a way. What I've done in the past is to set aside 4 bytes in a file as a placeholder for a CRC32, filling them with zeros. Then I calculate the CRC of the file.

It is then possible to fill the placeholder bytes to make the CRC of the file equal to an arbitrary fixed constant, by computing numbers in the Galois field of the CRC polynomial.

(Further details possible but not right at this moment. You basically need to compute (CRC_desired - CRC_initial) * 2-8*byte_offset in the Galois field, where byte_offset is the number of bytes between the placeholder bytes and the end of the file.)


Note: as per @KeithS's comments this solution is not to prevent against intentional tampering. We used it on one project as a means to tie metadata within an embedded system to the executable used to program it -- the embedded system itself does not have direct knowledge of the file(s) used to program it, and therefore cannot calculate a CRC or hash itself -- to detect inadvertent mismatch between an embedded system and the file used to program it. (In later systems I've just used UUIDs.)

like image 75
Jason S Avatar answered Nov 19 '22 06:11

Jason S


Of course this is possible, in a multitude of ways. However, it cannot prevent intentional tampering.

For example, let

hash(X) = sum of all 32-bit (non-overlapping) blocks of X modulo 65521. 

Let

Z = X followed by the 32-bit unsigned integer (hash(X) * 65521)

Then

hash(Z) == hash(X) == last 32-bits of Z

The idea here is just that any 32-bit integer congruent to 0 modulo 65521 will have no effect on the hash of X. Then, since 65521 < 2^16, hash has a range less then 2^16, and there are at least 2^16 values less than 2^32 congruent to 0 modulo 65521. And so we can encode the hash into a 32 bit integer that will not affect the hash. You could actually use any number less than 2^16, 65521 just happens to be the largest such prime number.

like image 23
Chris Hopman Avatar answered Nov 19 '22 04:11

Chris Hopman


I remember an old DOS program that was able to embed in a text file the CRC value of that file. However, this is possible only with simple hash functions.
Altough in theory you could create such file for any kind of hash function (given enough time or the right algorithm), the attacker would be able to use exactly the same approach. Even more, he would have a chose: to use exactly your approach to obtain such file, or just to get rid of the check.

It means that now you have two problems instead of one, and both should be implemented with the same complexity. It's up to you to decide if it worth it.

EDIT: you could consider hashing some intermediary results (like RAW decoded output, or something specific to your codec). In this way the decoder would have it anyway, but for another program it would be more difficult to compute.

like image 1
ruslik Avatar answered Nov 19 '22 04:11

ruslik