I want to fetch all the comments on CNN whose comment system is Disqus. As an example, http://edition.cnn.com/2013/02/25/tech/innovation/google-glass-privacy-andrew-keen/index.html?hpt=hp_c1
The commenting system requires us to click on "load more" so that we can see more comments. I have tried using php to parse the html but it was not able to load all the comments since the javascript is used. So i am wondering if anyone has a more convenient way to retrieve all the comments from a specific cnn url.
Has anyone made it successfully? Thanks in advance
The Disqus API contains a pagination method using cursors that are returned in the JSON response. See here for information about cursors: http://disqus.com/api/docs/cursors/
Since you mentioned PHP, something like this should get you started:
<?php
$apikey = '<your key here>'; // get keys at http://disqus.com/api/ — can be public or secret for this endpoint
$shortname = '<the disqus forum shortname>'; // defined in the var disqus_shortname = '...';
$thread = 'link:<URL of thread>'; // IMPORTANT the URL that you're viewing isn't necessarily the one stored with the thread of comments
//$thread = 'ident:<identifier of thread>'; Use this if 'link:' has no results. Defined in 'var disqus_identifier = '...';
$limit = '100'; // max is 100 for this endpoint. 25 is default
$endpoint = 'https://disqus.com/api/3.0/threads/listPosts.json?api_key='.$apikey.'&forum='.$shortname.'&limit='.$limit.'&cursor='.$cursor;
$j=0;
listcomments($endpoint,$cursor,$j);
function listcomments($endpoint,$cursor,$j) {
// Standard CURL
$session = curl_init($endpoint.$cursor);
curl_setopt($session, CURLOPT_RETURNTRANSFER, 1); // instead of just returning true on success, return the result on success
$data = curl_exec($session);
curl_close($session);
// Decode JSON data
$results = json_decode($data);
if ($results === NULL) die('Error parsing json');
// Comment response
$comments = $results->response;
// Cursor for pagination
$cursor = $results->cursor;
$i=0;
foreach ($comments as $comment) {
$name = $comment->author->name;
$comment = $comment->message;
$created = $comment->createdAt;
// Get more data...
echo "<p>".$name." wrote:<br/>";
echo $comment."<br/>";
echo $created."</p>";
$i++;
}
// cursor through until today
if ($i == 100) {
$cursor = $cursor->next;
$i = 0;
listcomments($endpoint,$cursor);
/* uncomment to only run $j number of iterations
$j++;
if ($j < 10) {
listcomments($endpoint,$cursor,$j);
}*/
}
}
?>
Just an addition: to get the url of disqus comments on any page that it's found, run this JavaScript code in the web browser console:
var visit = function () {
var url = document.querySelector('div#disqus_thread iframe').src;
String.prototype.startsWith = function (check) {
return(this.indexOf(check) == 0);
};
if (!url.startsWith('https://')) return url.slice(0, 4) + "s" + url.slice(4);
return url;
}();
Since the variable is now in 'visit'
console.log(visit);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With