I have been trying my hands on IBM Watson speechtotext api. However, it works with short length audio files, but not with audio files which are around 5 mins. It's giving me below error "watson {'code_description': 'Bad Request', 'code': 400, 'error': 'No speech detected for 30s.'}"
I am using Watson's trial account. Is there a limitation in case of trial account? or bug in below code.
Python code:-
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
username='XXX',
password='XXX',
x_watson_learning_opt_out=False
)
with open('trial.flac', 'rb') as audio_file:
print(speech_to_text.recognize(audio_file, content_type='audio/flac', model='en-US_NarrowbandModel', timestamps=False, word_confidence=False, continuous=True))
Appreciate any help!
Please see the implementation notes from the Speech to Text API Explorer for the recognize API you are attempting to use:
Implementation Notes
Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.
Streaming mode
For requests to transcribe live audio as it becomes available or to transcribe multiple audio files with multipart requests, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.
There are two factors here. First there is a data size limit of 100 MB, so I would make sure you do not send files larger then that to the Speech to Text service. Secondly, you can see the server will close the connection and return a 400 error if there is no speech detected for the amount of seconds defined for inactivity_timeout. It seems the default value is 30 seconds, so this matches the error you are seeing above.
I would suggest you make sure there is valid speech in the first 30 seconds of your file and/or make the inactivity_timeout parameter larger to see if the problem still exists. To make things easier, you can test the failing file and other sound files by using the API Explorer in a browser:
Speech to Text API Explorer
In the API documentation, there is this python code, it will avoid to close the server when the default 30s finishes, and works for other errors too.
It's like a "try and except" with the extra step of instanciating the function in a class.
def on_error(self, error):
print('Error received: {}'.format(error))
Here it is the link https://cloud.ibm.com/apidocs/speech-to-text?code=python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With