pyaudio bytes data to librosa floating point time series

Question

when audio is recording using pyaudio with paInt16, it gives me 16 bits integer represented as two bytes. With some studying, I concluded that it must be # between -32768 to 32767.

I saved audio as wav file and load it back with librosa.core.load. I did retrieved float value * 32767 and see whether it generates original 16bits integer but it was not matching at all.

My questions are

Where is this mismatch coming from??
is original 16-bit integer data represents frequency?
librosa doc state that load function returns floating point time series. how do you calculate this value from the original 16-bit integer?

Brandon Lee · Accepted Answer

After studying and exploring the librosa code, here are my findings.

The mismatch comes from the fact that wav byte array is little endian
The representation is called Pulse-code modulation(PCM). Each sample (single integer) represents the magnitude of audio scaled to the range of prespecified bit range, (usually 16 bits). refer audio bit depth for detail
Given PCM is 16 bits representation, each sample has a range of [-32768, 32767]. librosa simply transform 16 bits into signed short and divide by 32768 (not 32767!) to scale down to [-1, 1] range. please refer to my sample code for exact conversion

pyaudio bytes data to librosa floating point time series

Tags:

audio

wav

pyaudio

librosa

Brandon Lee

1 Answers

Brandon Lee

Recent Activity

Donate For Us

pyaudio bytes data to librosa floating point time series

Tags:

audio

wav

pyaudio

librosa

Brandon Lee

1 Answers

Brandon Lee

Related questions

Recent Activity

Donate For Us