Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert Amazon Transcribe json response to a caption format (srt, webvvt, etc)?

Trying to find a package that convert my json response from the Amazon AWS Transcribe service with no luck.

You can see an example of the JSON in the JavaScript part of the Fiddle.

I wouldn't like to take the naive approach and just "bundle" like 10 words together as that would space the captions in a weird way.

I'd even accept a programmatic way of doing it using the Google Speech service or Speechmatics. They all return a json file broken down by word.

Anyone has worked with that before?

Thanks!

like image 522
Daniel Angel Avatar asked Jan 31 '18 16:01

Daniel Angel


3 Answers

Inspired from yash answer I took it and made small changes. Feel free to use it.

https://apoorv.blog/aws-transcribe-json-to-srt.html

I personally use this tool for my own purposes so expect to stay updated.

like image 105
Apoorv Mote Avatar answered Oct 23 '22 17:10

Apoorv Mote


You probably would have found a way to do that or created a script. I also tried finding some ready made solution so ended up writing some JavaScript code to generate SRT from the JSON output of Amazon Transcribe.

https://www.yash.info/aws-srt-creator.htm

I am breaking sentences at period (.). It's a standalone HTML file. Feels free to download and modify as required.

like image 35
Yash Gadhiya Avatar answered Oct 23 '22 17:10

Yash Gadhiya


I've used this python script from github and it formats really nicely into docx format. The output even includes scatterplots of the confidence levels of words as well as changing the colors to lower confidence words.

https://github.com/kibaffo33/aws_transcribe_to_docx

This worked really well for me, but I think you could have this go to html fairly simply if you wanted to alter the python script.

like image 31
Tim Clauss Avatar answered Oct 23 '22 18:10

Tim Clauss