I need to crawl down all of the comments (more than 2,600,000 comments, over 5000 pages) for PSY's Gangnam Style Video from YouTube, see: http://www.youtube.com/all_comments?v=9bZkp7q19f0
The problem is:
1) If I use gdata service, google provides only no more than 1000 comment feeds
2) If I directly crawl html tags from:
site(http://www.youtube.com/all_comments?v=9bZkp7q19f0&page=$(page))
by increasing the page parameter, it would fail after page #101, where no comments displayed on the page.
So plz everyone, how can I get around this problem?
P.S: My crawler is implemented as a chrome extension using javascript, which checks the comment tags of the loaded page, and then loading next page.
You may be able to extract the data by crawling the pages and hacking the code for the problems encountered, but that is not the proper way.
You should use the youtube api for this and check the other developer resources concerning to this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With