Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python read sound file, ogg or wav?

I want to import music in Python, I am using soundfile. I noticed that importing ogg or wav files yield different results, as the following shows (the wav file is a conversion of the ogg file using ffmpeg). Using the code below, I observe a small difference between the ogg and wav files, is this difference normal ?

Edit : I used the following command to convert my ffmpeg -i filename.mp3 newfilename.wav

X, sample_rate= sf.read(wav_file)
print(wav_file)
print(X[0:20,])

And it outputs:

test_inputs/Shikantaza.wav
[[  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [ -3.05175781e-05  -3.05175781e-05]
 [ -3.05175781e-05   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]]
test_inputs/Shikantaza.ogg
[[  1.17459308e-06   3.78499834e-07]
 [  5.19584228e-06   2.25495864e-06]
 [  1.13173719e-05   6.28675980e-06]
 [  1.07316619e-05   4.50928837e-06]
 [  2.70867986e-06  -3.40946622e-06]
 [  5.37277947e-06   5.06399772e-07]
 [  3.64179391e-06   6.27796169e-07]
 [ -5.09244865e-06  -6.14764804e-06]
 [ -4.38827237e-06  -3.74127058e-06]
 [ -5.41250847e-06  -3.70974522e-06]
 [ -2.75347884e-06  -7.08531957e-07]
 [ -9.67129495e-07   6.15705801e-07]
 [ -4.91217952e-06  -3.82820826e-06]
 [  4.38740926e-06   6.00675048e-06]
 [ -3.00040119e-06  -4.78463562e-08]
 [ -2.18559871e-05  -1.67418439e-05]
 [ -1.57035538e-05  -8.82137283e-06]
 [ -1.28820702e-05  -5.31934711e-06]
 [ -9.44996100e-06  -8.10974825e-07]
 [ -5.33486082e-06   3.71237797e-06]]
like image 232
RUser4512 Avatar asked Jun 05 '26 03:06

RUser4512


1 Answers

For the first file you are decoding to 16-bit linear PCM in WAV and then converting that to floating point. For the second file you are decoding to floating point directly. 16-bit linear PCM has less precision than floating point so that will lose information, although the loss would normally be negligible compared to the loss of the lossy compression, so could be ignored.

Although WAV is most often used with 16-bit linear PCM it is also possible to store floating point PCM in a wav file (although the file will be about twice as large). To write floating point in wav:

ffmpeg -i in.ogg -c:a pcm_f32le out.wav

There could also be differences in the decoders for the lossy formats which produce slightly different results. Also if one of the decoders is not gapless it may only produce whole frames and may therefore have a few extra samples at the beginning and/or end.

like image 144
mark4o Avatar answered Jun 07 '26 23:06

mark4o



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!