I am trying to use the google-speech2text api however, I keep getting "Specify MP3 encoding to match audio file" even though I have setup my code to go through all available encoders.
This is the file I am trying to use
I have to add, If I upload the file on their UI I can get an output. So I assume there is nothing wrong in the source file.
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')
speech_file = 'chunk7.mp3'
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
import wave
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16,
enums.RecognitionConfig.AudioEncoding.FLAC,
enums.RecognitionConfig.AudioEncoding.MULAW,
enums.RecognitionConfig.AudioEncoding.AMR,
enums.RecognitionConfig.AudioEncoding.AMR_WB,
enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='en-US')
# Detects speech in the audio file
response = []
print(response)
try:
response = client.recognize(config, audio)
print(response)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
Alternatively, there is another file here in Persian ('fa-IR') - which I face the similar issue. I initially put the Obama file as it is more understandable. I appreciate if test your answer with the second file as well.
It seems like you're setting encoding
equal to all the possible attributes the API offers. I found that:
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
Works for mp3 files. So try this:
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'
def sample_recognize(local_file_path):
"""
Transcribe a short audio file using synchronous speech recognition
Args:
local_file_path Path to local audio file, e.g. /path/audio.wav
"""
client = speech_v1.SpeechClient()
# local_file_path = 'resources/brooklyn_bridge.raw'
# The language of the supplied audio
language_code = "en-US"
# Sample rate in Hertz of the audio data sent
sample_rate_hertz = 16000
# If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]
# Encoding of audio data sent. This sample sets this explicitly.
# This field is optional for FLAC and WAV audio formats.
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
config = {
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding,
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(config, audio)
for result in response.results:
# First alternative is the most probable result
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
sample_recognize(speech_file)
The above code is a slightly modified sample from the speech-to-text docs. If that doesn't work try looking deeper into encoding docs and best practices. Good luck.
It looks like you got some unsupported audio format, make it easy just by converting to other format(flac advised), you got two options:
Convert it yourself in you machine:
1) Install sox (editing)
2) Install encoders need it:
* [lame](http://lame.sourceforge.net) mp3 encoder
* [flac](https://xiph.org/flac/download.html) flac encoder
3) run command:
sox source.mp3 --channels=1 --bits=16 dest.flac
In which case you can also use python to execute command:
import subprocess
subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath])
Notice you don't need to specify neither sample_rate_hertz nor encoding just because all that info it's in flac headers itself, so you can omit them:
config = types.RecognitionConfig(language_code="fa-IR")
esponse = client.recognize(config, audio)
Resources: troubleshooting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With