Google Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?

Question

I successfully obtained the transcript and alternatives for a 5 minute long audio using Google Cloud Speech API (longrunningrecognize), but I'm not getting the full text of these 5 minutes, just a small transcript, as seen below:

{
  "name": "2340863807845687922",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2018-09-20T13:25:57.948053Z",
    "lastUpdateTime": "2018-09-20T13:28:18.406147Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "I am recording it. I think",
            "confidence": 0.9223639
          }
        ]
      },
      {
        "alternatives": [
          {
            "transcript": "these techniques properly stated",
            "confidence": 0.9190353
          }
        ]
      }
    ]
  }
}

How do I get the full text generated by the transcription ?

Nikolay Shmyrev · Accepted Answer

Google Speech API is very painful thing to work with. Beside not being able to translate long files they randomly skip large chunks of audio from the transcription. Possible solutions are:

Split audio on chunks with voice activity detection and transcribe every chunk separately
Use more reasonable service like Speechmatics, they will process big files without any issue with better accuracy
Use open source speech recognizer like Kaldi.

razimbres · Answer

I successfully solved this issue. I had to properly convert the file with ffmpeg:

$ ffmpeg -i /home/user/audio_test.wav -ac 1 -ab 8k audio_test2.wav

*** Remove silence:

sox audio_test2.wav audio_no_silence4.wav silence -l 1 0.1 1% -1 2.0 1%

And fix my sync-request.json:

{"config": {
      "encoding":"MULAW",
      "sampleRateHertz": 8000,
      "languageCode": "pt-BR",
      "enableWordTimeOffsets": false,
    "enableAutomaticPunctuation": false,
 "enableSpeakerDiarization": true,
    "useEnhanced": true,
`enter code here`"diarizationSpeakerCount":2,
 "audioChannelCount": 1},
  "audio": {
      "uri":"gs://storage/audio_no_silence4.wav"
  }
}

And run curl after that. It is working perfectly now.

Google Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?

Tags:

speech-recognition

speech-to-text

google-speech-api

razimbres

2 Answers

Nikolay Shmyrev

razimbres

Recent Activity

Donate For Us

Google Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?

Tags:

speech-recognition

speech-to-text

google-speech-api

razimbres

2 Answers

Nikolay Shmyrev

razimbres

Related questions

Recent Activity

Donate For Us