Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OGG_OPUS fails with Google Speech API, but seems fine with LINEAR16 on the same sample?

There seem to be issues with submitting OGG_OPUS into the google speech API, it doesn't return any results and exits however the same sample when converted to LINEAR16 works fine.

Using the standard python libraries with synchronous submits for both samples with the following parameters for each format:

sample = speech_client.sample(
    content,
    source_uri=None,
    encoding='LINEAR16',
    sample_rate_hertz=16000)

sample = speech_client.sample(
    content,
    source_uri=None,
    encoding='OGG_OPUS',
    sample_rate_hertz=16000)

Sample is converted to LINEAR16 via:

./ffmpeg-git-20170621-64bit-static/ffmpeg -i ./audio.opus -acodec libopus -b:a 16000 -f s16le -acodec pcm_s16le output.raw

Original audio is recorded via MediaRecorder in js from chrome 58: https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder It seems perfectly fine as far as Opus audio goes and uses the following constructor parameters:

audioBitsPerSecond=16000
mimeType="audio/webm"

The error returned for OGG_OPUS is:

ValueError: No results returned from the Speech API.

Initially I was a bit confused due to OPUS generally registering to ffprobe as 48000 bitrate but that seems to be due to codec defaults in decoding at 48000 regardless of sampling rate.

like image 916
Petros Rizos Avatar asked Jun 25 '17 21:06

Petros Rizos


1 Answers

The Configuration you have set may be not supported or can be bad configurations, can you please try with wave file and below configs:

config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=44100, language_code='en-US')

You can check your configs from the following link by uploading the audio file https://www.get-metadata.com/

like image 193
AALAP JETHWA Avatar answered Nov 04 '22 09:11

AALAP JETHWA