I downloaded a file and used md5sum to see if the download was successful without corruption. I got the following value:
a7099fcf9572d91b10d0073b07e112cb ./Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz
But when I checked the website I downloaded the file from, it gave me the following value.
10256 63747 Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz
What is this 10 digit code? is it not md5?
I downloaded the file from : ftp://ftp.ensembl.org/pub/release-70/fasta/macaca_mulatta/dna/CHECKSUMS
A checksum is intended to verify (check) the integrity of data and identify data-transmission errors, while a hash is designed to create a unique digital fingerprint of the data. A checksum protects against accidental changes. A cryptographic hash protects against a very motivated attacker.
Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to verify data integrity, but only against unintentional corruption.
The md5sum command is based on the MD5 algorithm and generates 128-bit message digests. The md5sum command enables you to verify the integrity of files downloaded over a network connection. You can also use the md5sum command to compare files and verify the integrity of files.
Wrong Hash Algorithm It's possible you, or the website, created a different checksum than what's specified in documentation. You may be able to determine the function of the original function by its character length. Ensure you used the correct function and command for your operating system (OS) – Unix, Windows, Mac.
Ensembl is using the unix 'sum' utilty to calcualte the CHECKSUM.gz file.
Here's more info about the program : http://en.wikipedia.org/wiki/Sum_%28Unix%29
To see if your download is correct, try:
sum Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz
NOTE: It happened before that Ensembl did not update their CHECKSUM file so it can always happen that the download is correct but the CHECKSUM.gz file is incorrect.
They are not the same thing. MD5 is a checksum but there are other checksum algorithms that are not MD5, such as SHA, CRC etc.
Generally a checksum is a function that takes an input that's larger in size than its output and (it better) produces greatly different outputs even if one bit in the input is changed.
The output you're looking at consists of two 5-digit decimal numbers, so it's likely your checksum algorithm is CRC32. The unix sum
command may be used to calculate/verify it.
MD5 is a way to do a checksum, but there are others. CRC is one, so is SHA. All MD5 does is produce a hash code, and it is not the only algorithm to do so. I'm not sure what the 10 digit one is, but it can't be MD5.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With