Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are Hash Trees useful?

I was reading on Wikipedia about hash trees, and I don't understand the benefits or purposes of this structure - they seem to require more hashes than just one per leaf with no significant use of the extra hashes.

For example, the use case on wikipedia is that they are used to validate data received in a P2P system. But why is this better than having a one-to-one mapping of blocks numbers and their hashes, without having the tree structure?

Could someone please explain how and why hash trees are useful?

Thanks in advance,

Moshe

like image 930
Moshe Avatar asked Nov 12 '12 01:11

Moshe


1 Answers

  1. Hash trees can be computed in parallel. If you have two blocks of data to hash, you can use two processors to compute the hash twice as fast. This only works if your hash speed is lower than your IO speed, which is unlikely.

  2. Hash trees can be computed from hashes of individual blocks, or from hashes of larger sections that are aligned correctly. This is important.

For example, if I want to send you a file, I can break it up into chunks of 1 MiB and send you each chunk with its SHA-256 hash. If the hash for any of the individual chunks is incorrect, then you can ask for that chunk again. At the end, I can sign the tree hash for the file and send you the signed hash. You can verify the hash just by hashing each of the block hashes (which you already verified), which is a lot faster than rehashing the entire file.

Why use a tree hash?

A tree hash is advantageous any time that you want to compute the hash of both a portion of a file and the entire file. Using a regular hash like SHA-256, you will have to hash the file chunk and the entire file separately. If the file is 8 GiB, this might take quite some time. With a tree hash, because the hash of the chunk is used to compute the hash of the file, it takes no extra work to compute both hashes.

How much extra work is a tree hash?

The "extra work" for computing a tree hash is actually minimal. Yes, it does require computing extra hashes -- but only O(1) extra work. If your block size is 1 MiB, then the extra work is approximately zero if your file is 1 MiB or smaller. As the data size increases, the amount of extra work will approach 1 extra hash of two hashes for every block of data -- for SHA-256, the core will only be evaluated two extra times per 1 MiB of data at most (once for the input hashes, once for the padding). That's not very much.

like image 175
Dietrich Epp Avatar answered Sep 22 '22 14:09

Dietrich Epp