Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyaudio bytes data to librosa floating point time series

when audio is recording using pyaudio with paInt16, it gives me 16 bits integer represented as two bytes. With some studying, I concluded that it must be # between -32768 to 32767.

I saved audio as wav file and load it back with librosa.core.load. I did retrieved float value * 32767 and see whether it generates original 16bits integer but it was not matching at all.

My questions are

  1. Where is this mismatch coming from??
  2. is original 16-bit integer data represents frequency?
  3. librosa doc state that load function returns floating point time series. how do you calculate this value from the original 16-bit integer?
like image 780
Brandon Lee Avatar asked Nov 24 '18 20:11

Brandon Lee


1 Answers

After studying and exploring the librosa code, here are my findings.

  1. The mismatch comes from the fact that wav byte array is little endian

  2. The representation is called Pulse-code modulation(PCM). Each sample (single integer) represents the magnitude of audio scaled to the range of prespecified bit range, (usually 16 bits). refer audio bit depth for detail

  3. Given PCM is 16 bits representation, each sample has a range of [-32768, 32767]. librosa simply transform 16 bits into signed short and divide by 32768 (not 32767!) to scale down to [-1, 1] range. please refer to my sample code for exact conversion

like image 187
Brandon Lee Avatar answered Sep 22 '22 05:09

Brandon Lee