I want to programmatically create a SHA1 checksum of audio files (MP3, Ogg Vorbis, Flac).
The requirement is that the checksum should be stable even if the header (eg. ID3) changes.
Note: The audio files don't have CRCs
This is what I tried by now:
my $sha1 = Digest::SHA1->new;
while (my $frame = MPEG::Audio::Frame->read(\*FH)) {
$sha1->add($frame->content());
}
mf = mad.MadFile(path)
sha1 = hashlib.sha1()
while 1:
buf = mf.read()
if (buf is None):
break
sha1.update(buf)
> mp3cat - - < file.mp3 | sha1sum
However, none of those methods provided a stable checksum. Namely, in some cases the checksum changed after retagging the file with picard.
Are there any libraries that already provide what I want?
I don't care about the programming language…
Update: I debugged the case a bit further. The libmad checksum inconsitency seems to happen in cases where libmad gets some decoding errors, like "Huffman data overrun (0x0238)". As this really happens on many of the mp3 files I'm not sure if it really indicates a broken file…
If you are looking for stable hashes for the actual music you might want to look at libOFA. Your current methods will give you different results because the formats can have embedded tags. Also if you want two different files with the same song to return the same hash you need to regard things like bitrate and sample frequencies.
libOFA on the other hand can give you a stable hash that can be used between formats and different encodings. Might be what you want?
I needed tools to quickly check if my MP3/OGG library is still valid. For MP3 I found mp3md5.py (http://snipplr.com/view/4025/mp3-checksum-in-id3-tag/) which does the job, but no simple tool for OGG Vorbis, but I coded a little bash script to do this for me. Both tools should tolerate modifications of the comment/ID3Tag.
#!/bin/bash
# This bash script appends an MD5SUM to the vorbiscomment and/or verifies it if it exists
# Later modification of the vorbis comment does not alter the MD5SUM
# Julian M.K.
FILE="$1"
if [[ ! -f "$FILE" || ! -r "$FILE" || ! -w "$FILE" ]] ; then
echo "File $FILE" does not exist or is not readable or writable
exit 1
fi
OLDCRC=`vorbiscomment "$FILE" | grep ^CRC=|cut -d "=" -f 2`
NEWCRC=`ogginfo "$FILE" |grep "Total data length:" |cut -d ":" -f 2 | md5sum |cut -d " " -f 1`
if [[ "$OLDCRC" == "" ]] ; then
echo "ADDED $FILE $NEWCRC"
vorbiscomment -a -t "CRC=$NEWCRC" "$FILE"
# rewrite CRC to get proper data length, I dont know why this is necessary
NEWCRC=`ogginfo "$FILE" |grep "Total data length:" |cut -d ":" -f 2 | md5sum |cut -d " " -f 1`
vorbiscomment -w -t "CRC=$NEWCRC" "$FILE"
elif [[ "$OLDCRC" == "$NEWCRC" ]] ; then
echo "VERIFIED $FILE"
else
echo "FAILURE $FILE -- $OLDCRC - $NEWCRC"
fi
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With