There seem to be issues with submitting OGG_OPUS into the google speech API, it doesn't return any results and exits however the same sample when converted to LINEAR16 works fine.
Using the standard python libraries with synchronous submits for both samples with the following parameters for each format:
sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate_hertz=16000)
sample = speech_client.sample(
content,
source_uri=None,
encoding='OGG_OPUS',
sample_rate_hertz=16000)
Sample is converted to LINEAR16 via:
./ffmpeg-git-20170621-64bit-static/ffmpeg -i ./audio.opus -acodec libopus -b:a 16000 -f s16le -acodec pcm_s16le output.raw
Original audio is recorded via MediaRecorder in js from chrome 58: https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder It seems perfectly fine as far as Opus audio goes and uses the following constructor parameters:
audioBitsPerSecond=16000
mimeType="audio/webm"
The error returned for OGG_OPUS is:
ValueError: No results returned from the Speech API.
Initially I was a bit confused due to OPUS generally registering to ffprobe as 48000 bitrate but that seems to be due to codec defaults in decoding at 48000 regardless of sampling rate.
The Configuration you have set may be not supported or can be bad configurations, can you please try with wave file and below configs:
config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=44100, language_code='en-US')
You can check your configs from the following link by uploading the audio file https://www.get-metadata.com/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With