Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?

I successfully obtained the transcript and alternatives for a 5 minute long audio using Google Cloud Speech API (longrunningrecognize), but I'm not getting the full text of these 5 minutes, just a small transcript, as seen below:

{
  "name": "2340863807845687922",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2018-09-20T13:25:57.948053Z",
    "lastUpdateTime": "2018-09-20T13:28:18.406147Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "I am recording it. I think",
            "confidence": 0.9223639
          }
        ]
      },
      {
        "alternatives": [
          {
            "transcript": "these techniques properly stated",
            "confidence": 0.9190353
          }
        ]
      }
    ]
  }
}

How do I get the full text generated by the transcription ?

like image 801
razimbres Avatar asked Nov 19 '25 07:11

razimbres


2 Answers

Google Speech API is very painful thing to work with. Beside not being able to translate long files they randomly skip large chunks of audio from the transcription. Possible solutions are:

  1. Split audio on chunks with voice activity detection and transcribe every chunk separately
  2. Use more reasonable service like Speechmatics, they will process big files without any issue with better accuracy
  3. Use open source speech recognizer like Kaldi.
like image 138
Nikolay Shmyrev Avatar answered Nov 22 '25 03:11

Nikolay Shmyrev


I successfully solved this issue. I had to properly convert the file with ffmpeg:

$ ffmpeg -i /home/user/audio_test.wav -ac 1 -ab 8k audio_test2.wav

*** Remove silence:

sox audio_test2.wav audio_no_silence4.wav silence -l 1 0.1 1% -1 2.0 1%

And fix my sync-request.json:

{"config": {
      "encoding":"MULAW",
      "sampleRateHertz": 8000,
      "languageCode": "pt-BR",
      "enableWordTimeOffsets": false,
    "enableAutomaticPunctuation": false,
 "enableSpeakerDiarization": true,
    "useEnhanced": true,
`enter code here`"diarizationSpeakerCount":2,
 "audioChannelCount": 1},
  "audio": {
      "uri":"gs://storage/audio_no_silence4.wav"
  }
}

And run curl after that. It is working perfectly now.

like image 31
razimbres Avatar answered Nov 22 '25 04:11

razimbres



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!