I'm using pyaudio to record my voice as wav file. I'm using following code:
def voice_recorder():
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 22050
CHUNK = 1024
RECORD_SECONDS = 4
WAVE_OUTPUT_FILENAME = "first.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print "konusun..."
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
#print "finished recording"
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
I'm using following code for Google Speech API which basically converts the speech in the WAV file to text: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe.py
When I try to import the wav file which is generated by pyaudio to Google's code, I'm getting following error:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1beta1/speech:syncrecognize?alt=json returned "Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 22050.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 22050.">
I'm using following workaround for this: I'm converting WAV file to MP3 with ffmpeg, after then I'm converting MP3 file to wav again with sox:
def wav_to_mp3():
FNULL = open(os.devnull, 'w')
subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT)
def mp3_to_wav():
subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav'])
Google's API works with this WAV output but since quality decreases too much, it doesn't perform well.
So how can I create Google compatible WAV file with pyaudio at the first step?
Supported File Types Currently, SpeechRecognition supports the following file formats: WAV: must be in PCM/LPCM format. AIFF.
The usual bitstream encoding is the linear pulse-code modulation (LPCM) format. WAV is an application of the Resource Interchange File Format (RIFF) bitstream format method for storing data in chunks, and thus is similar to the 8SVX and the AIFF format used on Amiga and Macintosh computers, respectively.
Mu-law ( audio/mulaw ) is a single-channel, lossy audio format. The data is encoded by using the u-law (or mu-law) algorithm. The audio/basic format is an equivalent format that is always sampled at 8 kHz.
Converting wav file to flac file with avconv and sending it to Google Speech API solved the problem
subprocess.call(['avconv', '-i', 'first.wav', '-y', '-ar', '48000', '-ac', '1', 'last.flac'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With