Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract automatic captions from YouTube video

I'm having problems extracting automatic captions from YouTube videos.

I tried using the http://video.google.com/timedtext?type=track&v=3wszM2SA12E&name=Automatic&lang=en method, but that one only works for those videos, which have named tracks. For example, this one doesn't have any named tracks (only automatic caption) and doesn't load up: rrkrvAUbU9Y

There are several web-applications out there which can do it (like http://www.serpsite.com/youtube-subtitles-download-tool/ and http://mo.dbxdb.com/), but I need a script, because I want to use it for my research.

Anyone has any ideas what is the correct way to get this? YouTube's API has something about captions, but only for registered users, while the apps above work for all videos and I doubt they just capture the html code from the page (although that's possible too). There must be a way... please help!

like image 523
Aerodynamika Avatar asked Dec 23 '12 18:12

Aerodynamika


People also ask

Can you copy YouTube captions?

You can easily copy and format YouTube captions into Word document transcriptions. This process is recommended to ensure accessibility for all videos. YouTube automatically provides captions for uploaded videos.


1 Answers

Here my suggestions after spending some time:

  • Js library: https://github.com/syzer/youtube-captions-scraper => support auto-generated caption.

  • 2 quick methods below not support auto-generated caption

    • Get a list of subtitles: http://video.google.com/timedtext?type=list&v=lT3vGaOLWqE
    • Get subtitle with track id: http://video.google.com/timedtext?type=track&v=lT3vGaOLWqE&id=0&lang=en
  • Quick download: http://downsub.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dag_EJRhMfOM

like image 171
Solominh Avatar answered Sep 20 '22 16:09

Solominh