I successfully obtained the transcript and alternatives for a 5 minute long audio using Google Cloud Speech API (longrunningrecognize), but I'm not getting the full text of these 5 minutes, just a small transcript, as seen below:
{
"name": "2340863807845687922",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2018-09-20T13:25:57.948053Z",
"lastUpdateTime": "2018-09-20T13:28:18.406147Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results": [
{
"alternatives": [
{
"transcript": "I am recording it. I think",
"confidence": 0.9223639
}
]
},
{
"alternatives": [
{
"transcript": "these techniques properly stated",
"confidence": 0.9190353
}
]
}
]
}
}
How do I get the full text generated by the transcription ?
Google Speech API is very painful thing to work with. Beside not being able to translate long files they randomly skip large chunks of audio from the transcription. Possible solutions are:
I successfully solved this issue. I had to properly convert the file with ffmpeg:
$ ffmpeg -i /home/user/audio_test.wav -ac 1 -ab 8k audio_test2.wav
*** Remove silence:
sox audio_test2.wav audio_no_silence4.wav silence -l 1 0.1 1% -1 2.0 1%
And fix my sync-request.json:
{"config": {
"encoding":"MULAW",
"sampleRateHertz": 8000,
"languageCode": "pt-BR",
"enableWordTimeOffsets": false,
"enableAutomaticPunctuation": false,
"enableSpeakerDiarization": true,
"useEnhanced": true,
`enter code here`"diarizationSpeakerCount":2,
"audioChannelCount": 1},
"audio": {
"uri":"gs://storage/audio_no_silence4.wav"
}
}
And run curl after that. It is working perfectly now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With