when audio is recording using pyaudio with paInt16
, it gives me 16 bits integer represented as two bytes. With some studying, I concluded that it must be # between -32768 to 32767.
I saved audio as wav file and load it back with librosa.core.load
.
I did retrieved float value * 32767 and see whether it generates original 16bits integer but it was not matching at all.
My questions are
floating point time series
. how do you calculate this value from the original 16-bit integer?After studying and exploring the librosa code, here are my findings.
The mismatch comes from the fact that wav byte array is little endian
The representation is called Pulse-code modulation(PCM). Each sample (single integer) represents the magnitude of audio scaled to the range of prespecified bit range, (usually 16 bits). refer audio bit depth for detail
Given PCM is 16 bits representation, each sample has a range of [-32768, 32767]. librosa simply transform 16 bits into signed short and divide by 32768 (not 32767!) to scale down to [-1, 1] range. please refer to my sample code for exact conversion
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With