I'm having a problem with PHP's cURL returning an empty string with some URL's. I'm trying to parse the OG metadata of different webpages and it works with all websites I've tried except for NYTimes. Here is my code so far.
print_r(get_og_metadata('http://somewebsite.com'));
public function get_data($url)
{
$ch = curl_init();
$timeout = 5;
// the url to fetch
curl_setopt($ch, CURLOPT_URL, $url);
// return result as a string rather than direct output
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// set max time of cURL execution
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
public function get_og_metadata($url)
{
libxml_use_internal_errors(TRUE);
$data = $this->_get_data($url);
$doc = new DOMDocument();
$doc->loadHTML($data);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metadatas = $xpath->query($query);
$result = array();
foreach($metadatas as $metadata)
{
$property = $metadata->getAttribute('property');
$content = $metadata->getAttribute('content');
$result[$property] = $content;
}
return $result;
}
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
My guess is that a site like the New York times has protection against such behavior. Most likely this is based on the user agent, which you can fake as so:
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
This is the most common agent btw.
(That other answer is me also)
This is what did it for me. It was looking for SSL verificaiton, which I happened to not need in this specific case.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With