Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating suitable WAV files for Google Speech API

I'm using pyaudio to record my voice as wav file. I'm using following code:

def voice_recorder():
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 22050
    CHUNK = 1024
    RECORD_SECONDS = 4
    WAVE_OUTPUT_FILENAME = "first.wav"

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
    print "konusun..."
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    #print "finished recording"


    # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()

I'm using following code for Google Speech API which basically converts the speech in the WAV file to text: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py

When I try to import the wav file which is generated by pyaudio to Google's code, I'm getting following error:

googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1beta1/speech:syncrecognize?alt=json returned "Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 22050.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 22050.">

I'm using following workaround for this: I'm converting WAV file to MP3 with ffmpeg, after then I'm converting MP3 file to wav again with sox:

def wav_to_mp3():
    FNULL = open(os.devnull, 'w')
    subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT)

def mp3_to_wav():
    subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav'])

Google's API works with this WAV output but since quality decreases too much, it doesn't perform well.

So how can I create Google compatible WAV file with pyaudio at the first step?

like image 424
JayGatsby Avatar asked Jan 09 '17 18:01

JayGatsby


People also ask

What is the correct input file format supported for speech recognition?

Supported File Types Currently, SpeechRecognition supports the following file formats: WAV: must be in PCM/LPCM format. AIFF.

What encoding does WAV use?

The usual bitstream encoding is the linear pulse-code modulation (LPCM) format. WAV is an application of the Resource Interchange File Format (RIFF) bitstream format method for storing data in chunks, and thus is similar to the 8SVX and the AIFF format used on Amiga and Macintosh computers, respectively.

Is Mulaw lossless?

Mu-law ( audio/mulaw ) is a single-channel, lossy audio format. The data is encoded by using the u-law (or mu-law) algorithm. The audio/basic format is an equivalent format that is always sampled at 8 kHz.


1 Answers

Converting wav file to flac file with avconv and sending it to Google Speech API solved the problem

subprocess.call(['avconv', '-i', 'first.wav', '-y', '-ar', '48000', '-ac', '1', 'last.flac'])
like image 166
JayGatsby Avatar answered Sep 20 '22 20:09

JayGatsby