Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`md5sum -c` won't work with Apache's MD5 file format

Tags:

apache

md5sum

Let me take you on a journey..

I'm trying to download and verify Apache Spark (http://www.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz) via MD5 on a fresh Debian (Jessie) machine.

The md5sum script already existed on that machine without me needing to do anything.

As such I continue by downloading the MD5 checksum (http://www.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz.md5) to the same directory as the downloaded Spark, and then I execute:

md5sum -c spark-1.6.0-bin-hadoop2.6.tgz.md5

This fails with:

md5sum: spark-1.6.0-bin-hadoop2.6.tgz.md5: no properly formatted MD5 checksum lines found

And so I check the contents via cat spark-1.6.0-bin-hadoop2.6.tgz.md5:

spark-1.6.0-bin-hadoop2.6.tgz: 62 4B 16 1F 67 70 A6 E0  E0 0E 57 16 AF D0 EA 0B

That's the whole file. Looks decent to me - maybe the Spark download was actually bad? Before taking that assumption I'll first see what the MD5 is now via md5sum spark-1.6.0-bin-hadoop2.6.tgz:

624b161f6770a6e0e00e5716afd0ea0b  spark-1.6.0-bin-hadoop2.6.tgz

Hmm, that's a completely different format - but if you look hard enough you'll notice that the numbers and letters are actually the same (except lowercase and without spaces). It looks like the md5sum that comes with Debian is following a different standard.

Maybe there's another way I can run this command? Lets try md5sum --help:

Usage: md5sum [OPTION]... [FILE]...
Print or check MD5 (128-bit) checksums.
With no FILE, or when FILE is -, read standard input.

  -b, --binary         read in binary mode
  -c, --check          read MD5 sums from the FILEs and check them
      --tag            create a BSD-style checksum
  -t, --text           read in text mode (default)

The following four options are useful only when verifying checksums:
      --quiet          don't print OK for each successfully verified file
      --status         don't output anything, status code shows success
      --strict         exit non-zero for improperly formatted checksum lines
  -w, --warn           warn about improperly formatted checksum lines

      --help     display this help and exit
      --version  output version information and exit

The sums are computed as described in RFC 1321.  When checking, the input
should be a former output of this program.  The default mode is to print
a line with checksum, a character indicating input mode ('*' for binary,
space for text), and name for each FILE.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report md5sum translation bugs to <http://translationproject.org/team/>
Full documentation at: <http://www.gnu.org/software/coreutils/md5sum>
or available locally via: info '(coreutils) md5sum invocation'

Okay, --tag seems to change the format. Lets try md5sum --tag spark-1.6.0-bin-hadoop2.6.tgz:

MD5 (spark-1.6.0-bin-hadoop2.6.tgz) = 624b161f6770a6e0e00e5716afd0ea0b

Indeed, that is a different format, but still not the right one.. So I look to the instructions on the Apache Download Mirrors page and find the following text:

Alternatively, you can verify the MD5 hash on the file. A unix program called md5 or md5sum is included in many unix distributions. It is also available as part of GNU Textutils...

So I follow that link and find that Textutils was merged to Coreutils in 2003 - so I actually want the md5sum from Coreutils then. However you can see at the bottom of the md5sum --help dump that it's already from Coreutils.

That might mean that my Coreutils are out of date. So I'll apt-get update && apt-get upgrade coreutils, but then I found out that:

Calculating upgrade... coreutils is already the newest version.

That's a dead end then.. but wait a moment, they said "md5 or md5sum"! Lets check out that lead.

The md5 script doesn't exist yet, so I'll try apt-get install md5:

E: Unable to locate package md5

And now I'm lost, and so turn to Google and then StackOverflow for help.. Now here I am.

So what's with the two different MD5 file formats and how can I deal with this issue (and finally verify my Apache Spark)?

like image 589
Bilal Akil Avatar asked Oct 19 '22 16:10

Bilal Akil


1 Answers

I believe gpg --print-md md5 spark-1.6.0-bin-hadoop2.6.tgz should match the .md5 file's content.

There were problems with the format of the md5/sha files 'cause the script that builds the spark release uses gpg --print-md md5 to create the signature files. See: https://issues.apache.org/jira/browse/SPARK-5308

like image 182
delephin Avatar answered Oct 21 '22 22:10

delephin