Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do some sites download YouTube captions?

This is somewhat of a duplicate question of Does YouTube API forbid to download video captions if you are not it's owner?, Get YouTube captions and Does YouTube API forbid to download video captions if you are not it's owner?, which all basically say it's not possible unless to download captions via the YouTube API unless you are the owner or third-party contributions are not enabled; however, my question is how to sites like http://downsub.com/ or http://www.lilsubs.com/ have access to all captions?

In other words, when I access the YouTube API myself (even with youtubepartner and youtube.force-ssl scopes), I can only download the captions of some videos, but when I try the same videos that failed for me with 403: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption. on these other sites, it works fine. I'm assuming they are using the YouTube API to access the captions, but what special sauce are they using? Some special partner key? An different API version? Are they just scraping from the videos themselves or something?

like image 814
ryanbrainard Avatar asked Oct 21 '17 14:10

ryanbrainard


People also ask

Is there a way to download YouTube subtitles?

Select the pencil icon (Details) for the video you want to edit. Select Subtitles and Click on EDIT. To download the subtitles (caption) without the time stamp, click EDIT AS TEXT. Note: To download the subtitles with time stamp, click 3 dots next to “Edit as Text.”


1 Answers

A 2022 answer:

Option 1: Send a curl request to the webpage: curl -L "https://youtu.be/YbJOTdZBX1g", search for timedtext in the result, and you would get a URL. replace \u0026 with & and you get the link for the subtitle.

Option 2: Use the yt-dlp package:

# For installing see: https://github.com/yt-dlp/yt-dlp#with-pip
from yt_dlp import YoutubeDL

ydl_opts = {
    "skip_download": True,
    "writesubtitles": True,
    "subtitleslangs": ["all", "-live_chat"],
    # Looks like formats available are vtt, ttml, srv3, srv2, srv1, json3
    "subtitlesformat": "json3",
    # You can skip the following option
    "sleep_interval_subtitles": 1,
}
with YoutubeDL(ydl_opts) as ydl:
    ydl.download(["YbJOTdZBX1g"])
like image 89
C-Y Avatar answered Sep 22 '22 11:09

C-Y