Does it make sense to use cross-correlation on arrays of timestamps?

Tags:

I want to find the offset between two arrays of timestamps. They could represent, let's say, the onset of beeps in two audio tracks.

Note: There may be extra or missing onsets in either track.

I found some information about cross-correlation (e.g. https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar) which looked promising.

I assumed that each audio track is 10 seconds in duration, and represented the beep onsets as the peaks of a "square wave" with a sample rate of 44.1 kHz:

import numpy as np

rfft = np.fft.rfft
irfft = np.fft.irfft

track_1 = np.array([..., 5.2, 5.5, 7.0, ...])
# The onset in track_2 at 8.0 is "extra," it has no
# corresponding onset in track_1
track_2 = np.array([..., 7.2, 7.45, 8.0, 9.0, ...])
frequency = 44100
num_samples = 10 * frequency
wave_1 = np.zeros(num_samples)
wave_1[(track_1 * frequency).astype(int)] = 1
wave_2 = np.zeros(num_samples)
wave_2[(track_2 * frequency).astype(int)] = 1
xcor = irfft(rfft(wave_1) * np.conj(rfft(wave_2)))
offset = xcor.argmax()

This approach isn't particularly fast, but I was able to get fairly consistent results even with quite low frequencies. However... I have no idea if this is a good idea! Is there a better way to find this offset than cross-correlation?

Edit: added note about missing and extra onsets.

566

asked Oct 05 '16 15:10

user1475412

2 Answers

If track_1 and track_2 are timestamps of beeps and both catch all of the beeps, then there's no need to build a waveform and do cross-correlation. Just find the average delay between the two arrays of beep timestamps:

import numpy as np

frequency = 44100
num_samples = 10 * frequency
actual_time_delay = 0.020
timestamp_noise = 0.001

# Assumes track_1 and track_2 are timestamps of beeps, with a delay and noise added
track_1 = np.array([1,2,10000])
track_2 = np.array([1,2,10000]) + actual_time_delay*frequency + 
          frequency*np.random.uniform(-timestamp_noise,timestamp_noise,len(track_1))

calculated_time_delay = np.mean(track_2 - track_1) / frequency
print('Calculated time delay (s):',calculated_time_delay)

This yields approximately 0.020 s depending on the random errors introduced and is about as fast as it gets.

EDIT: If extra or missing timestamps need to be handled, then the code could be modified as the following. Essentially, if the error on the timestamp randomness is bounded, then perform statistical "mode" function on the delays between all the values. Anything within the timestamp randomness bound is clustered together and identified, then the original identified delays are then averaged.

import numpy as np
from scipy.stats import mode

frequency = 44100
num_samples = 10 * frequency
actual_time_delay = 0.020

# Assumes track_1 and track_2 are timestamps of beeps, with a delay, noise added,
#   and extra/missing beeps
timestamp_noise = 0.001
timestamp_noise_threshold = 0.002
track_1 = np.array([1,2,5,10000,100000,200000])
track_2 = np.array([1,2,44,10000,30000]) + actual_time_delay*frequency 
track_2 = track_2 + frequency*np.random.uniform(-timestamp_noise,timestamp_noise,len(track_2))

deltas = []
for t1 in track_1:
    for t2 in track_2:
        deltas.append( t2 - t1)
sorted_deltas = np.sort(deltas)/frequency
truncated_deltas = np.array(sorted_deltas/(timestamp_noise_threshold)+0.5, 
dtype=int)*timestamp_noise_threshold

truncated_time_delay = min(mode(truncated_deltas).mode,key=abs)
calculated_time_delay = np.mean(sorted_deltas[truncated_deltas == truncated_time_delay])

print('Calculated time delay (s):',calculated_time_delay)

Obviously, if too many beeps are missing or extra beeps exist then the code begins to fail at some point, but in general it should work well and be way faster than trying to generate a whole waveform and performing correlation.

answered Oct 22 '22 16:10

Brian

Yes it does make sense. This is commonly done in matlab. Here is a link to a similar application:

http://www.mathworks.com/help/signal/ug/cross-correlation-of-delayed-signal-in-noise.html

Several considerations

Cross-correlation is commonly used for times when the signal in question has too much noise. If you don't have any noise to worry about I would use a different method however.

answered Oct 22 '22 15:10

Eric Bischoff

Related questions
                            
                                How to install scipy without mkl
                            
                                Convert datetime to unix timestamp in SQLAlchemy model before executing query?
                            
                                Why are my Django migrations loading my urls.py?
                            
                                How to break time.sleep() in a python concurrent.futures
                            
                                How to block and wait for async, callback based Python function calls
                            
                                embed ipython notebook in sphinx document
                            
                                Speeding up the light processing of ~50GB CSV file
                            
                                Writing to particular address in memory in python
                            
                                Why do NumPy operations with complex infinities lead to funny results?
                            
                                get the amplitude data from an mp3 audio files using python
                            
                                What is the difference between using the '@patch.object' and 'with patch.object' in Python?
                            
                                Get windows 10 build version (release ID)
                            
                                Mock not working on module function
                            
                                pip install from specific setup.py
                            
                                python lxml etree applet information from yahoo
                            
                                Django Cache (Redis) auto refresh when data is changed
                            
                                Django model, Overriding the save method or custom method with property
                            
                                Python27(win): import daemon, but there is an error: No module named pwd
                            
                                Keras Convolution2D Input: Error when checking model input: expected convolution2d_input_1 to have shape
                            
                                Multiprocessing and Selenium Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does it make sense to use cross-correlation on arrays of timestamps?

Tags:

python

fft

waveform

cross-correlation

user1475412

People also ask

2 Answers

Brian

Eric Bischoff

Recent Activity

Donate For Us