Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate checksum of audio files without considering the header

I want to programmatically create a SHA1 checksum of audio files (MP3, Ogg Vorbis, Flac). The requirement is that the checksum should be stable even if the header (eg. ID3) changes.
Note: The audio files don't have CRCs

This is what I tried by now:

1) Reading + Hashing all MPEG frames using Perl and MPEG::Audio::Frame

my $sha1 = Digest::SHA1->new;
while (my $frame = MPEG::Audio::Frame->read(\*FH)) {
    $sha1->add($frame->content());
}

2) Decoding + Hashing all MPEG frames using Python and libmad (pymad)

mf = mad.MadFile(path)
sha1 = hashlib.sha1()

while 1:
    buf = mf.read()
    if (buf is None):
        break
    sha1.update(buf)

3) Using mp3cat

> mp3cat - - < file.mp3 | sha1sum

However, none of those methods provided a stable checksum. Namely, in some cases the checksum changed after retagging the file with picard.

Are there any libraries that already provide what I want?
I don't care about the programming language…

Update: I debugged the case a bit further. The libmad checksum inconsitency seems to happen in cases where libmad gets some decoding errors, like "Huffman data overrun (0x0238)". As this really happens on many of the mp3 files I'm not sure if it really indicates a broken file…

like image 404
Benedikt Waldvogel Avatar asked Dec 08 '08 23:12

Benedikt Waldvogel


2 Answers

If you are looking for stable hashes for the actual music you might want to look at libOFA. Your current methods will give you different results because the formats can have embedded tags. Also if you want two different files with the same song to return the same hash you need to regard things like bitrate and sample frequencies.

libOFA on the other hand can give you a stable hash that can be used between formats and different encodings. Might be what you want?

like image 144
Tobias R Avatar answered Sep 30 '22 00:09

Tobias R


I needed tools to quickly check if my MP3/OGG library is still valid. For MP3 I found mp3md5.py (http://snipplr.com/view/4025/mp3-checksum-in-id3-tag/) which does the job, but no simple tool for OGG Vorbis, but I coded a little bash script to do this for me. Both tools should tolerate modifications of the comment/ID3Tag.

#!/bin/bash

# This bash script appends an MD5SUM to the vorbiscomment and/or verifies it if it exists
# Later modification of the vorbis comment does not alter the MD5SUM
# Julian M.K.

FILE="$1"

if [[ ! -f "$FILE" || ! -r "$FILE" || ! -w "$FILE" ]] ; then
  echo "File $FILE" does not exist or is not readable or writable
  exit 1
fi

OLDCRC=`vorbiscomment "$FILE" | grep ^CRC=|cut -d "=" -f 2`
NEWCRC=`ogginfo "$FILE" |grep "Total data length:" |cut -d ":" -f 2 | md5sum |cut -d " " -f 1`

if [[ "$OLDCRC" == "" ]] ; then
  echo "ADDED $FILE  $NEWCRC"
  vorbiscomment -a -t "CRC=$NEWCRC" "$FILE" 
  # rewrite CRC to get proper data length, I dont know why this is necessary
  NEWCRC=`ogginfo "$FILE" |grep "Total data length:" |cut -d ":" -f 2 | md5sum |cut -d " " -f 1`
  vorbiscomment -w -t "CRC=$NEWCRC" "$FILE" 
elif [[ "$OLDCRC" == "$NEWCRC" ]]  ; then
  echo "VERIFIED $FILE"
else
  echo "FAILURE $FILE -- $OLDCRC - $NEWCRC"
fi
like image 32
Julian Kunkel Avatar answered Sep 30 '22 02:09

Julian Kunkel