How to get all YouTube comments with Python's gdata module?

Tags:

Looking to grab all the comments from a given video, rather than go one page at a time.

from gdata import youtube as yt
from gdata.youtube import service as yts

client = yts.YouTubeService()
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi

comments = client.GetYouTubeVideoComments(video_id='the_id')
a_comment = comments.entry[0]

The above code with let you grab a single comment, likely the most recent comment, but I'm looking for a way to grab all the comments at once. Is this possible with Python's gdata module?

The Youtube API docs for comments, the comment feed docs and the Python API docs

470

asked Oct 10 '12 19:10

TankorSmash

2 Answers

The following achieves what you asked for using the Python YouTube API:

from gdata.youtube import service

USERNAME = '[email protected]'
PASSWORD = 'a_very_long_password'
VIDEO_ID = 'wf_IIbT8HGk'

def comments_generator(client, video_id):
    comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id)
    while comment_feed is not None:
        for comment in comment_feed.entry:
             yield comment
        next_link = comment_feed.GetNextLink()
        if next_link is None:
             comment_feed = None
        else:
             comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href)

client = service.YouTubeService()
client.ClientLogin(USERNAME, PASSWORD)

for comment in comments_generator(client, VIDEO_ID):
    author_name = comment.author[0].name.text
    text = comment.content.text
    print("{}: {}".format(author_name, text))

Unfortunately the API limits the number of entries that can be retrieved to 1000. This was the error I got when I tried a tweaked version with a hand crafted GetYouTubeVideoCommentFeed URL parameter:

gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}

Note that the same principle should apply to retrieve entries in other feeds of the API.

If you want to hand craft the GetYouTubeVideoCommentFeed URL parameter, its format is:

'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌rt_index}&max-results={max_results}'

The following restrictions apply: start-index <= 1000 and max-results <= 50.

154

answered Sep 30 '22 03:09

Pedro Romano

The only solution I've got for now, but it's not using the API and gets slow when there's several thousand comments.

import bs4, re, urllib2
#grab the page source for vide
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M
#pull out comments
soup = bs4.BeautifulSoup(data)
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'})
#do something with them, ie count them
print len(cmnts)

Note that due to 'class' being a builtin python name, you can't do regular searches for 'startwith' via regex or lambdas as seen here, since you're using a dict, over regular parameters. It also gets pretty slow due to BeautifulSoup, but it needs to get used because etree and minidom don't find matching tags for some reason. Even after prettyfying() with bs4

answered Sep 30 '22 04:09

TankorSmash

Related questions
                            
                                How to rename all folders?
                            
                                How to change ancestor of an NDB record?
                            
                                Python OOP --action() function
                            
                                python count business weeks
                            
                                Why does the Tkinter canvas request 4 extra pixels for the width and height?
                            
                                How to create cgi.FieldStorage for testing purposes?
                            
                                How to create a dict from 2 dictionaries in Python?
                            
                                Scope of variables in python decorators - changing parameters
                            
                                SQLAlchemy column_property basics
                            
                                Most pythonic way of accepting arguments using optparse
                            
                                Killing a script launched in a Process via os.system()
                            
                                Python lxml - get index of tag's text
                            
                                render throws error AttributeError META - (Exception location: __getattr__ in urllib2)
                            
                                Matrix calculations on equidistant elements without for loops
                            
                                Most efficient way to traverse file structure Python
                            
                                Given module m and code object c, what does "exec c in m.__dict__" do?
                            
                                Is there any better way to capture the screen than PIL.ImageGrab.grab()?
                            
                                python __get__ method
                            
                                Generating a directed word graph in python
                            
                                Python's equivalent for Matlab's persistent [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get all YouTube comments with Python's gdata module?

Tags:

python

youtube-api

gdata

TankorSmash

People also ask

2 Answers

Pedro Romano

TankorSmash

Recent Activity

Donate For Us