I'm trying to write a Python script for searching out duplicate mp3/4 files using the song's data as the base for comparison. My situation involves many mp3/4 files with similar file names, but different ID3 tags. At first I tried looping through and using md5 to find duplicate files (ignoring file names). This, of course, didn't work when the ID3 tags didn't match.
As a result, I'm looking for a way to extract only the music data from an mp3/4 in order to run it through md5 and find any duplicates. What is the best way to go about this?
Try using id3-py or mutagen to strip out all the tags (both ID3v1 and ID3v2, they can both be on the same file), then computing the MD5 on the result.
Assuming iTunes didn't manipulate the file beyond tags they should be identical. Transcoding obviously would make this approach invalid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With