I'm using pyaudio to record my voice as wav file. I'm using following code: <pre class="prettyprint"><code>def voice_recorder(): FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 22050 CHUNK = 1024 RECORD_SECONDS = 4 WAVE_OUTPUT_FILENAME = "first.wav" audio = pyaudio.PyAudio() # start Recording stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print "konusun..." frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) #print "finished recording" # stop Recording stream.stop_stream() stream.close() audio.terminate() waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb') waveFile.setnchannels(CHANNELS) waveFile.setsampwidth(audio.get_sample_size(FORMAT)) waveFile.setframerate(RATE) waveFile.writeframes(b''.join(frames)) waveFile.close() </code></pre> I'm using following code for Google Speech API which basically converts the speech in the WAV file to text: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py When I try to import the wav file which is generated by pyaudio to Google's code, I'm getting following error: <pre class="prettyprint"><code>googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1beta1/speech:syncrecognize?alt=json returned "Invalid Configuration, Does not match Wav File Header. Wav Header Contents: Encoding: LINEAR16 Channels: 2 Sample Rate: 22050. Request Contents: Encoding: linear16 Channels: 1 Sample Rate: 22050."> </code></pre> I'm using following workaround for this: I'm converting WAV file to MP3 with ffmpeg, after then I'm converting MP3 file to wav again with sox: <pre class="prettyprint"><code>def wav_to_mp3(): FNULL = open(os.devnull, 'w') subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT) def mp3_to_wav(): subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav']) </code></pre> Google's API works with this WAV output but since quality decreases too much, it doesn't perform well. So how can I create Google compatible WAV file with pyaudio at the first step?

Converting wav file to flac file with avconv and sending it to Google Speech API solved the problem <pre class="prettyprint"><code>subprocess.call(['avconv', '-i', 'first.wav', '-y', '-ar', '48000', '-ac', '1', 'last.flac']) </code></pre>

Creating suitable WAV files for Google Speech API

Tags:

python

wav

pyaudio

google-speech-api

I'm using pyaudio to record my voice as wav file. I'm using following code:

def voice_recorder():
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 22050
    CHUNK = 1024
    RECORD_SECONDS = 4
    WAVE_OUTPUT_FILENAME = "first.wav"

    audio = pyaudio.PyAudio()

    # start Recording
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
    print "konusun..."
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    #print "finished recording"


    # stop Recording
    stream.stop_stream()
    stream.close()
    audio.terminate()

    waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    waveFile.setnchannels(CHANNELS)
    waveFile.setsampwidth(audio.get_sample_size(FORMAT))
    waveFile.setframerate(RATE)
    waveFile.writeframes(b''.join(frames))
    waveFile.close()

I'm using following code for Google Speech API which basically converts the speech in the WAV file to text: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py

When I try to import the wav file which is generated by pyaudio to Google's code, I'm getting following error:

googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1beta1/speech:syncrecognize?alt=json returned "Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 22050.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 22050.">

I'm using following workaround for this: I'm converting WAV file to MP3 with ffmpeg, after then I'm converting MP3 file to wav again with sox:

def wav_to_mp3():
    FNULL = open(os.devnull, 'w')
    subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT)

def mp3_to_wav():
    subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav'])

Google's API works with this WAV output but since quality decreases too much, it doesn't perform well.

So how can I create Google compatible WAV file with pyaudio at the first step?

424

asked Jan 09 '17 18:01

JayGatsby

1 Answers

Converting wav file to flac file with avconv and sending it to Google Speech API solved the problem

subprocess.call(['avconv', '-i', 'first.wav', '-y', '-ar', '48000', '-ac', '1', 'last.flac'])

166

answered Sep 20 '22 20:09

JayGatsby

Related questions
                            
                                Difference between "wrapper" and "method" descriptors?
                            
                                Django templates and whitespace
                            
                                How to automatically generate Python API documentation in PyCharm [closed]
                            
                                Same Python code, same data, different results on different machines
                            
                                Python - reading 10 bit integers from a binary file
                            
                                mplot3d animation with transparent background
                            
                                How can a neural network architecture be visualized with Keras?
                            
                                Is there a specific range of unicode code points which can be checked for emojis?
                            
                                'EntryPoint' object has no attribute 'resolve' when using Google Compute Engine
                            
                                glib.GError: Error interpreting JPEG image file (Unsupported marker type 0x05)
                            
                                Python Click: custom error message
                            
                                Python, why is my probabilistic neural network (PNN) always predicting zeros?
                            
                                How to define namespace in python?
                            
                                Cannot import sqlite3 in Python3
                            
                                how to convert all columns from numeric to categorical using Python
                            
                                How to create a public cython function that can receive c++ struct/instance or python object as parameter?
                            
                                SQLAlchemy OperationalError due to Query-invoked autoflush
                            
                                Stratified Train/Validation/Test-split in scikit-learn
                            
                                Can't connect to Cloud SQL using PyMySQL
                            
                                Find array corresponding to minimal values along an axis in another array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating suitable WAV files for Google Speech API

Tags:

python

wav

pyaudio

google-speech-api

JayGatsby

People also ask

1 Answers

JayGatsby

Recent Activity

Donate For Us