Does "16bit integer PCM data" mean it's signed or unsigned?

Tags:

I'm using FMOD to develop an application which would immediately start playing the recording of the next/previous sentence exactly from its beginning in a MP3 file which contains speech, without music, when the user clicked the Next/Prev button. I got the PCM data of a mp3 file by calling Sound::lock, but Sound::getFormat only told me it was "16bit integer PCM data", without saying whether it was signed or unsigned. How would I know that?

Some articles on the Internet say that almost all 16-bit integer PCM data are signed. If my PCM data is signed, what range of values represent silence, those values close to 0 (e.g. -10 ~ 10), or the values close to -32768 (e.g. -32768 ~ -32750)? If they are the values close to 0, does this mean that there's no difference in meaning between opposite numbers like -32767 and 32767?

I need to detect silences which are long enough, e.g. longer than 500ms, to determine where each sentence in the speech begins.

Could anyone give me any suggestions on how to detect silence between sentences?

978

asked Feb 20 '15 15:02

xiaokaoy

Video Answer

1 Answers

16-bit audio is, by convention, usually signed.

Think about what PCM audio is: each measure is how far along its axis the speaker should physically rest at that moment in time. Therefore perfect silence is absolutely any repeating value — that represents the speaker not moving.

0 is then the centre of the range, and usually where a microphone should be with no input. -32768 is the speaker as close to one end of its axis as it can be, 32767 is it at the other end.

The safest way to detect silence would be to run a spectral analysis over the relevant range and look for periods where there is no activity in any audible frequency range.

If you're looking for pauses between speech then the easiest thing would probably be to go to somewhere like this, plug in an acceptable frequency range for speech (it's considered to be around 300Hz to around 3500Hz in telephony), your sampling rate and however many multiplications you think you can afford. Copy the coefficients supplied. E.g. I assumed you'll do 37 taps across the speech range with a 44100Hz input and, converted to a C array, I got:

double coefficients[] = {
    -0.000560, -0.001290, -0.002332, -0.003606, -0.004911, -0.005921,  -0.006201, 
    -0.005256, -0.002610, 0.002106, 0.009059, 0.018139, 0.028924, 0.040691,  0.052479, 
    0.063203, 0.071794, 0.077351, 0.079274, 0.077351, 0.071794, 0.063203,  0.052479, 
    0.040691, 0.028924, 0.018139, 0.009059, 0.002106, -0.002610, -0.005256, -0.006201, 
    -0.005921, -0.004911, -0.003606, -0.002332, -0.001290, -0.000560};

If it were double input, for each input sample c I'd then compute a sampled value:

double *inputWave = ... input, an infinite array for the purposes of the example ...
double sampledValue = 0.0;
for(size_t coeff = 0; coeff < numberOfTaps; coeff++) {
    sampledValue += coefficients[coeff] * inputWave[c + coeff];
}

// (where numberOfTaps = sizeof(coefficients) / sizeof(coefficients[0]),
// i.e. the number of coefficients: 37 with the array given above)

What I've then got is a bandpass filter. Only that part of the signal representing sound in the frequency range 300–3500Hz should remain in the output values. In real life no such filter is perfect; increase the number of coefficients to increase the quality of your filter.

Having cut irrelevant parts of the signal I could then look for prolonged periods of sampledValue = [close to] 0.0.

answered Sep 25 '22 18:09

Tommy

Related questions
                            
                                Algorithm to remove vocal from sound track [closed]
                            
                                plot audio data in gnuplot
                            
                                Streaming audio from server to iPhone
                            
                                frame rate vs sample rate
                            
                                Python change pitch of wav file [closed]
                            
                                Trim audio files with Sox in milliseconds
                            
                                where to start with audio synthesis on iPhone
                            
                                How can I detect whether a WAV file has a 44 or 46-byte header?
                            
                                Join two WAV files from Java?
                            
                                AVAudioPlayer with external URL to *.m4p
                            
                                How to find the fundamental frequency of a guitar string sound?
                            
                                midi keyboard not working on all platforms
                            
                                How to mix / overlay two mp3 audio file into one mp3 file (not concatenate)
                            
                                Playing 2 musics through 2 different sound cards at same time
                            
                                How to make my application be considered as a communication program in Windows
                            
                                Changing Speed of Audio Using the Web Audio API Without Changing Pitch
                            
                                Perceptual similarity between two audio sequences
                            
                                Resources for audio DSP beginners? [closed]
                            
                                Music Analysis and Visualization
                            
                                Playing Sound In Hidden Tag

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does "16bit integer PCM data" mean it's signed or unsigned?

Tags:

signal-processing

audio

pcm

fmod

xiaokaoy

People also ask

Video Answer

1 Answers

Tommy

Recent Activity

Donate For Us