Basically, I have a lot of audio files representing the same song. However, some of them are worse quality than the original, and some are edited to where they do not match the original song anymore. What I'd like to do is programmatically compare these audio files to the original and see which ones match up with that song, regardless of quality. A direct comparison would obviously not work because the quality of the files varies. I believe this could be done by analyzing the structure of the songs and comparing to the original, but I know nothing about audio engineering so that doesn't help me much. All the songs are of the same format (MP3). Also, I'm using Python, so if there are bindings for it, that would be fantastic; if not, something for the JVM or even a native library would be fine as well, as long as it runs on Linux and I can figure out how to use it.

This is actually not a trivial task. I do not think any off-the-shelf library can do it. Here is a possible approach: <ol> <li>Decode mp3 to PCM.</li> <li>Ensure that PCM data has specific sample rate, which you choose beforehand (e.g. 16KHz). You'll need to resample songs that have different sample rate. High sample rate is not required since you need a fuzzy comparison anyway, but too low sample rate will lose too much details.</li> <li>Normalize PCM data (i.e. find maximum sample value and rescale all samples so that sample with largest amplitude uses entire dynamic range of data format, e.g. if sample format is signed 16 bit, then after normalization max. amplitude sample should have value 32767 or -32767).</li> <li>Split audio data into frames of fixed number of samples (e.g.: 1000 samples per frame).</li> <li>Convert each frame to spectrum domain (FFT).</li> <li>Calculate correlation between sequences of frames representing two songs. If correllation is greater than a certain threshold, assume the songs are the same.</li> </ol> Python libraries: <ul> <li> PyMedia (for step 1)</li> <li> NumPy (for data processing) -- also see this article for some introductory info</li> </ul> An additional complication. Your songs may have a different length of silence at the beginning. So to avoid false negatives, you may need an additional step: 3.1. Scan PCM data from the beginning, until sound energy exceeds predefined threshold. (E.g. calculate RMS with a sliding window of 10 samples and stop when it exceeds 1% of dynamic range). Then discard all data until this point.

Compare two audio files [duplicate]

Tags:

python

audio

mp3

Basically, I have a lot of audio files representing the same song. However, some of them are worse quality than the original, and some are edited to where they do not match the original song anymore. What I'd like to do is programmatically compare these audio files to the original and see which ones match up with that song, regardless of quality. A direct comparison would obviously not work because the quality of the files varies.

I believe this could be done by analyzing the structure of the songs and comparing to the original, but I know nothing about audio engineering so that doesn't help me much. All the songs are of the same format (MP3). Also, I'm using Python, so if there are bindings for it, that would be fantastic; if not, something for the JVM or even a native library would be fine as well, as long as it runs on Linux and I can figure out how to use it.

812

asked Jul 03 '10 21:07

Sasha Chedygov

1 Answers

This is actually not a trivial task. I do not think any off-the-shelf library can do it. Here is a possible approach:

Decode mp3 to PCM.
Ensure that PCM data has specific sample rate, which you choose beforehand (e.g. 16KHz). You'll need to resample songs that have different sample rate. High sample rate is not required since you need a fuzzy comparison anyway, but too low sample rate will lose too much details.
Normalize PCM data (i.e. find maximum sample value and rescale all samples so that sample with largest amplitude uses entire dynamic range of data format, e.g. if sample format is signed 16 bit, then after normalization max. amplitude sample should have value 32767 or -32767).
Split audio data into frames of fixed number of samples (e.g.: 1000 samples per frame).
Convert each frame to spectrum domain (FFT).
Calculate correlation between sequences of frames representing two songs. If correllation is greater than a certain threshold, assume the songs are the same.

Python libraries:

PyMedia (for step 1)
NumPy (for data processing) -- also see this article for some introductory info

An additional complication. Your songs may have a different length of silence at the beginning. So to avoid false negatives, you may need an additional step:

3.1. Scan PCM data from the beginning, until sound energy exceeds predefined threshold. (E.g. calculate RMS with a sliding window of 10 samples and stop when it exceeds 1% of dynamic range). Then discard all data until this point.

137

answered Oct 04 '22 03:10

atzz

Related questions
                            
                                Python3.4 error - Cannot enable executable stack as shared object requires: Invalid argument
                            
                                Find element by tag name within element by tag name (Selenium)
                            
                                Python Matplotlib scatter plot: Specify color points depending on conditions [duplicate]
                            
                                Multiply all elements of a list together (another list index out of range issue)
                            
                                Shuffle all rows of a csv file with Python
                            
                                Deleting blank lines from Jupyter notebook
                            
                                python apschedule BlockingScheduler with interval trigger: Start immediately
                            
                                Mean value of each element in multiple lists - Python
                            
                                Combine values of same keys in a list of dicts
                            
                                Changing step in Python loop [duplicate]
                            
                                Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape" [duplicate]
                            
                                How to save/load a tensorflow hub module to/from a custom path?
                            
                                How to write query result to Google Cloud Storage bucket directly?
                            
                                Repeat items in list to required length
                            
                                Best Python supported server/client protocol?
                            
                                How to markup form fields with <div class='field_type'> in Django
                            
                                how to transition from C# to python?
                            
                                Python date formatting without space?
                            
                                Sort strings by the first N characters
                            
                                Huge Graph Structure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With