Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Empty response when startindex >= 100

Tags:

youtube-api

After a lot of debugging, it finally occured to me that seemingly Youtube is only issueing the first 100 comments when using the v2 YouTube-API for getting comments. I finally tried using:

curl -Lk -X GET "http://gdata.youtube.com/feeds/api/videos/MShbP3OpASA/comments?alt=json&start-index=100&max-results=50"

And all I get is a response without an entry parameter. That is to say, I do not receive an error response or something like that - I get a perfectly good response, but without the entry parameter.

Digging a little deeper, in my response the value for openSearch$totalResults is 100, so in accordance to this resource this seems to be the expected result (although it tells about some kind of error message which I don't get?).

But here comes the kicker: When I use

curl -Lk -X GET "http://gdata.youtube.com/feeds/api/videos/MShbP3OpASA/comments?alt=json&start-index=1&max-results=50&orderby=published"

openSearch$totalResults equals 3141, the actual count of the comments.

Now here is my question: Since the v2 API is officially been deprecated about a week ago, is it possible that Google just set up a limit on the comments? So only the first 100 comments are accessible? Since the v3 API does not allow for comment retrieval, that would be a pretty bummer for me.

Does anyone have any ideas?

like image 559
alex Avatar asked Mar 09 '14 21:03

alex


1 Answers

I've figured out how to retrieve all the comments using the navigation links embedded in the json response.

Suppose you retrieve the first using a link like (python here, but you get the point):

r'https://gdata.youtube.com/feeds/api/videos/' + aVideoID + r'/comments?alt=json&start-index=1&max-results=50&prettyprint=true&orderby=published'

Embedded in the json under "feed" (and before the comments) will be a four element array called "link". The fourth element will be called "rel": "next" and under "href" there will be a link you can use to get the next 50 comments. The link will look something like:

https://gdata.youtube.com/feeds/api/videos/fH0cEP0mvlU/comments?alt=json&orderby=published&alt=json&start-token=EgkI2NqyoZDRvgIosK%2FPosPRvgIw653cmsXRvgI4AUAC&max-results=50&orderby=published

for an original URL of:

https://gdata.youtube.com/feeds/api/videos/fH0cEP0mvlU/comments?alt=json&start-index=1&max-results=50&prettyprint=true&orderby=published

If you follow the next link it will return similar json to the original link, with another 50 comments. Continue this process over and over until you get all the comments (in my code I check for both the absence of this item in the json or zero comments in the json to determine when to stop).

You need the "&orderby=published" in the original URL because otherwise the "next" links eventually grow to be too large and cause an error (something in the token the API uses to track which comments you've seen in the default orderby takes a lot of space). Something about the published orderby keeps the "start-token" small, whereas after about 500 comments with the default orderby you will start getting 414 Request URI too long errors.

Hope this helps.

like image 139
davec Avatar answered Oct 05 '22 04:10

davec