Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I think Librosa.effect.split has some problem?

Tags:

python

librosa

firstly, this function is to remove silence of an audio. here is the official description:

https://librosa.github.io/librosa/generated/librosa.effects.split.html

librosa.effects.split(y, top_db=10, *kargs)

Split an audio signal into non-silent intervals.

top_db:number > 0 The threshold (in decibels) below reference to consider as silence

return: intervals:np.ndarray, shape=(m, 2) intervals[i] == (start_i, end_i) are the start and end time (in samples) of non-silent interval i.

so this is quite straightforward, for any sound which is lower than 10dB, treat it as silence and remove from the audio. It will return me a list of intervals which are non-silent segments in the audio.

So I did a very simple example and the result confuses me: the audio i load here is a 3 second humand talking, very normal takling.

y, sr = librosa.load(file_list[0]) #load the data
print(y.shape) -> (87495,)

intervals = librosa.effects.split(y, top_db=100)
intervals -> array([[0, 87495]])

#if i change 100 to 10
intervals = librosa.effects.split(y, top_db=10)
intervals -> array([[19456, 23040],
                    [27136, 31232],
                    [55296, 58880],
                    [64512, 67072]])

how is this possible...

I tell librosa, ok, for any sound which is below 100dB, treat it as silence. under this setting, the whole audio should be treated as silence, and based on the document, it should give me array[[0,0]] something...because after remove silence, there is nothing left...

But it seems librosa returns me the silence part instead of the non-silence part.

like image 374
BarCodeReader Avatar asked Nov 13 '19 16:11

BarCodeReader


1 Answers

librosa.effects.split() It says in the documentation that it returns a numpy array that contains the intervals which contain non silent audio. These intervals of course depend on the value you assign to the parameter top_db. It does not return any audio, just the start and end points of the non-silent slices of your waveform

In your case, even if you set top_db = 100, it does not treat the entire audio as silence since they state in the documentation that they use The reference power. By default, it uses **np.max** and compares to the peak power in the signal. So setting your top_db higher than the maximal value that exists in your audio will actually result in top_db not having any effect. Here's an example:

import librosa
import numpy as np
import matplotlib.pyplot as plt

# create a hypothetical waveform with 1000 noisy samples and 1000 silent samples
nonsilent = np.random.randint(11, 100, 1000) * 100
silent = np.zeros(1000)
wave = np.concatenate((nonsilent, silent))
# look at it
print(wave.shape)
plt.plot(wave)
plt.show()

# get the noisy interval
non_silent_interval = librosa.effects.split(wave, top_db=0.1, hop_length=1000)
print(non_silent_interval)

# plot only the noisy chunk of the waveform
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()

# now set top_db higher than anything in your audio
non_silent_interval = librosa.effects.split(wave, top_db=1000, hop_length=1000)
print(non_silent_interval)

# and you'll get the entire audio again
plt.plot(wave[non_silent_interval[0][0]:non_silent_interval[0][1]])
plt.show()

You can see that non silent audio is from 0 to 1000 and the silent audio is from 1000 to 2000 samples: enter image description here

Here it only gives us the noisy chunk of the wave we created: enter image description here

And here is with top_db set at a 1000: enter image description here

That means librosa did everything that it promised to do in the documentation. Hope this helps.

like image 136
Ahmad Moussa Avatar answered Oct 06 '22 16:10

Ahmad Moussa