Let me take you on a journey..
I'm trying to download and verify Apache Spark (http://www.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz) via MD5 on a fresh Debian (Jessie) machine.
The md5sum
script already existed on that machine without me needing to do anything.
As such I continue by downloading the MD5 checksum (http://www.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz.md5) to the same directory as the downloaded Spark, and then I execute:
md5sum -c spark-1.6.0-bin-hadoop2.6.tgz.md5
This fails with:
md5sum: spark-1.6.0-bin-hadoop2.6.tgz.md5: no properly formatted MD5 checksum lines found
And so I check the contents via cat spark-1.6.0-bin-hadoop2.6.tgz.md5
:
spark-1.6.0-bin-hadoop2.6.tgz: 62 4B 16 1F 67 70 A6 E0 E0 0E 57 16 AF D0 EA 0B
That's the whole file. Looks decent to me - maybe the Spark download was actually bad? Before taking that assumption I'll first see what the MD5 is now via md5sum spark-1.6.0-bin-hadoop2.6.tgz
:
624b161f6770a6e0e00e5716afd0ea0b spark-1.6.0-bin-hadoop2.6.tgz
Hmm, that's a completely different format - but if you look hard enough you'll notice that the numbers and letters are actually the same (except lowercase and without spaces). It looks like the md5sum
that comes with Debian is following a different standard.
Maybe there's another way I can run this command? Lets try md5sum --help
:
Usage: md5sum [OPTION]... [FILE]...
Print or check MD5 (128-bit) checksums.
With no FILE, or when FILE is -, read standard input.
-b, --binary read in binary mode
-c, --check read MD5 sums from the FILEs and check them
--tag create a BSD-style checksum
-t, --text read in text mode (default)
The following four options are useful only when verifying checksums:
--quiet don't print OK for each successfully verified file
--status don't output anything, status code shows success
--strict exit non-zero for improperly formatted checksum lines
-w, --warn warn about improperly formatted checksum lines
--help display this help and exit
--version output version information and exit
The sums are computed as described in RFC 1321. When checking, the input
should be a former output of this program. The default mode is to print
a line with checksum, a character indicating input mode ('*' for binary,
space for text), and name for each FILE.
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report md5sum translation bugs to <http://translationproject.org/team/>
Full documentation at: <http://www.gnu.org/software/coreutils/md5sum>
or available locally via: info '(coreutils) md5sum invocation'
Okay, --tag
seems to change the format. Lets try md5sum --tag spark-1.6.0-bin-hadoop2.6.tgz
:
MD5 (spark-1.6.0-bin-hadoop2.6.tgz) = 624b161f6770a6e0e00e5716afd0ea0b
Indeed, that is a different format, but still not the right one.. So I look to the instructions on the Apache Download Mirrors page and find the following text:
Alternatively, you can verify the MD5 hash on the file. A unix program called
md5
ormd5sum
is included in many unix distributions. It is also available as part of GNU Textutils...
So I follow that link and find that Textutils was merged to Coreutils in 2003 - so I actually want the md5sum
from Coreutils then. However you can see at the bottom of the md5sum --help
dump that it's already from Coreutils.
That might mean that my Coreutils are out of date. So I'll apt-get update && apt-get upgrade coreutils
, but then I found out that:
Calculating upgrade... coreutils is already the newest version.
That's a dead end then.. but wait a moment, they said "md5
or md5sum
"! Lets check out that lead.
The md5
script doesn't exist yet, so I'll try apt-get install md5
:
E: Unable to locate package md5
And now I'm lost, and so turn to Google and then StackOverflow for help.. Now here I am.
So what's with the two different MD5 file formats and how can I deal with this issue (and finally verify my Apache Spark)?
I believe gpg --print-md md5 spark-1.6.0-bin-hadoop2.6.tgz
should match the .md5 file's content.
There were problems with the format of the md5/sha files 'cause the script that builds the spark release uses gpg --print-md md5
to create the signature files. See: https://issues.apache.org/jira/browse/SPARK-5308
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With