Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash Code and Checksum - what's the difference?

My understanding is that a hash code and checksum are similar things - a numeric value, computed for a block of data, that is relatively unique.

i.e. The probability of two blocks of data yielding the same numeric hash/checksum value is low enough that it can be ignored for the purposes of the application.

So do we have two words for the same thing, or are there important differences between hash codes and checksums?

like image 446
Richard Ev Avatar asked Jan 20 '09 09:01

Richard Ev


People also ask

Is hash and checksum the same?

A checksum is intended to verify (check) the integrity of data and identify data-transmission errors, while a hash is designed to create a unique digital fingerprint of the data. A checksum protects against accidental changes. A cryptographic hash protects against a very motivated attacker.

What is hash function or checksum?

A cryptographic hash, or checksum, is a digital fingerprint of a piece of data (e.g., a block of text) which can be used to check that you have an unaltered copy of that data. Emacs supports several common cryptographic hash algorithms: MD5, SHA-1, SHA-2, SHA-224, SHA-256, SHA-384 and SHA-512.

Why use a checksum?

A checksum is a string of numbers and letters that act as a fingerprint for a file against which later comparisons can be made to detect errors in the data. They are important because we use them to check files for integrity.

What is a checksum in programming?

What is a checksum for? Checksums are used to verify that the data file programmed to the Integrated Circuit (device) was programmed without error, such as errors in transmission. This is done by comparing the checksum of the data file in the programmer to the checksum of the data file on the device.


2 Answers

I would say that a checksum is necessarily a hashcode. However, not all hashcodes make good checksums.

A checksum has a special purpose --- it verifies or checks the integrity of data (some can go beyond that by allowing for error-correction). "Good" checksums are easy to compute, and can detect many types of data corruptions (e.g. one, two, three erroneous bits).

A hashcode simply describes a mathematical function that maps data to some value. When used as a means of indexing in data structures (e.g. a hash table), a low collision probability is desirable.

like image 117
Zach Scrivena Avatar answered Sep 30 '22 05:09

Zach Scrivena


There is a different purpose behind each of them:

  • Hash code - designed to be random across its domain (to minimize collisions in hash tables and such). Cryptographic hash codes are also designed to be computationally infeasible to reverse.
  • Check sum - designed to detect the most common errors in the data and often to be fast to compute (for effective checksumming fast streams of data).

In practice, the same functions are often good for both purposes. In particular, a cryptographically strong hash code is a good checksum (it is almost impossible that a random error will break a strong hash function), if you can afford the computational cost.

like image 23
Rafał Dowgird Avatar answered Sep 30 '22 04:09

Rafał Dowgird