Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving audio input of Android Stock speech recognition engine

I am trying to save in a file the audio data listened by speech recognition service of android.

Actually I implement RecognitionListener as explained here: Speech to Text on Android

save the data into a buffer as illustrated here: Capturing audio sent to Google's speech recognition server

and write the buffer to a Wav file, as in here. Android Record raw bytes into WAVE file for Http Streaming

My problem is how to get appropriate audio settings to save in the wav file's headers. In fact when I play the wav file only hear strange noise, with this parameters,

short nChannels=2;// audio channels int sRate=44100;    // Sample rate short bSamples = 16;// byteSample 

or nothing with this:

short nChannels=1;// audio channels int sRate=8000;    // Sample rate short bSamples = 16;// byteSample 

What is confusing is that looking at parameters of the speech recognition task from logcat I find first Set PLAYBACK sample rate to 44100 HZ:

    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK PCM format to S16_LE (Signed 16 bit Little Endian)     12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Using 2 channels for PLAYBACK.     12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK sample rate to 44100 HZ     12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Buffer size: 2048     12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Latency: 46439 

and then aInfo.SampleRate = 8000 when it plays the file to send to google server:

    12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::InitWavParser 12-20 14:41:36.152: DEBUG/(2364): File open Succes 12-20 14:41:36.152: DEBUG/(2364): File SEEK End Succes ... 12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData 12-20 14:41:36.152: DEBUG/(2364): Data Read buff = RIFF? 12-20 14:41:36.152: DEBUG/(2364): Data Read = RIFF? 12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData 12-20 14:41:36.152: DEBUG/(2364): Data Read buff = fmt  ... 12-20 14:41:36.152: DEBUG/(2364): PVWAVPARSER_OK 12-20 14:41:36.156: DEBUG/(2364): aInfo.AudioFormat = 1 12-20 14:41:36.156: DEBUG/(2364): aInfo.NumChannels = 1 12-20 14:41:36.156: DEBUG/(2364): aInfo.SampleRate = 8000 12-20 14:41:36.156: DEBUG/(2364): aInfo.ByteRate = 16000 12-20 14:41:36.156: DEBUG/(2364): aInfo.BlockAlign = 2 12-20 14:41:36.156: DEBUG/(2364): aInfo.BitsPerSample = 16 12-20 14:41:36.156: DEBUG/(2364): aInfo.BytesPerSample = 2 12-20 14:41:36.156: DEBUG/(2364): aInfo.NumSamples = 2258 

So, how can I find out the right parameters to save the audio buffer in a good wav audio file?

like image 742
mmmx Avatar asked Dec 20 '11 23:12

mmmx


People also ask

How do I use speech recognition offline?

Android does have offline speech recognition capabilities. You can activate this by going to Settings - Language and Input - Voice Input and touch the cog icon next to Enhanced Google Services.

What is speech engine in speech recognition?

A speech recognition engine is a component of the larger speech recognition system, which uses a speech rec engine, a text-to-speech engine and a dialog manager. A speech recognition engine has several components: a language model or grammar, an acoustic model and a decoder.


1 Answers

You haven't included your code for actually writing out the PCM data, so its hard to diagnose, but if you are hearing strange noises then it looks most likely you have the wrong endian when you are writing the data, or the wrong number of channels. Getting the sample rate wrong will only result in the audio sounding slower or faster, but if it sounds completely garbled it is probably either a mistake in specifying the number of channels or endianess of your byte stream.

To know for sure, just stream your bytes directly to a file without any header (raw PCM data). This way you can rule out any errors when writing your file header. Then use Audacity to import the raw data, experimenting with the different options (bit depth, endian, channels) until you get an audio file that sounds correct (only one will be right). You do this from File->Import->Raw Data...

Once you have identified your byte format this way you only have to worry about whether you are setting the headers correctly. You might want to refer to this reference http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html for the file format. Or see the following links on existing Java solutions on writing audio files, Java - reading, manipulating and writing WAV files , or FMJ. Although I guess these might not be usable on Android.

If you are having to roll your own WAV/RIFF writer remember Java's data types are big-endian so any multi-byte primitives you write to your file must be written in reverse byte order to match RIFF's little-endianess.

like image 185
Malcolm Smith Avatar answered Oct 14 '22 06:10

Malcolm Smith