Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: get all youtube video urls of a channel

I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?

import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url
like image 233
Johnny Avatar asked Mar 19 '13 23:03

Johnny


People also ask

How to call the YouTube API from a URL?

Let's define our functions to call the YouTube API: We'll be using get_channel_videos () to get the videos of a specific channel, and get_channel_details () will allow us to extract information about a specific youtube channel. We first get the channel ID from the URL, and then we make an API call to get channel details and print them.

How to get YouTube video’s various info using Python?

The get Youtube video’s various info you can use different Pafy functions, for example: URL= type your video url print (video.title) print (video. view count) and more... To see full list of the Pafy fucntion see the documentation. So, this is how one can web scrape YouTube data and load the same in his console with the help of coding in Python.

How to get all the video links from a YouTube channel?

scrape-youtube-channel-videos-url.py is used to grab the video links from a YouTube Channel. How to use it: If you want to get all the video links from CBC channel, so just run the command like following: Example result CBCtv-202001011120.list was uploaded.

What is YouTube Data API and how to use it?

Each and every service provided by Google has an associated API. Being one of them, YouTube Data API is very simple to use provides features like – Handle videos like retrieve information about a video, insert a video, delete a video, etc.


2 Answers

Short answer:

Here's a library That can help with that.

pip install scrapetube

import scrapetube

videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")

for video in videos:
    print(video['videoId'])

Long answer:

The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:

  1. Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
  2. Using youtube-dl. Like this:
import youtube_dl
    youtube_dl_options = {
        'skip_download': True,
        'ignoreerrors': True
    }
    with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
        videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')

This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).

So i made the library scrapetube which uses the web API to get all the videos.

like image 169
dermasmid Avatar answered Nov 15 '22 19:11

dermasmid


After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.

import urllib
import json

def get_all_video_in_channel(channel_id):
    api_key = YOUR API KEY

    base_video_url = 'https://www.youtube.com/watch?v='
    base_search_url = 'https://www.googleapis.com/youtube/v3/search?'

    first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)

    video_links = []
    url = first_url
    while True:
        inp = urllib.urlopen(url)
        resp = json.load(inp)

        for i in resp['items']:
            if i['id']['kind'] == "youtube#video":
                video_links.append(base_video_url + i['id']['videoId'])

        try:
            next_page_token = resp['nextPageToken']
            url = first_url + '&pageToken={}'.format(next_page_token)
        except:
            break
    return video_links
like image 24
Stian Avatar answered Nov 15 '22 20:11

Stian