I'm looking to build a PHP script that parses HTML for particular tags. I've been using this code block, adapted from this tutorial:
<?php
$data = file_get_contents('http://www.google.com');
$regex = '/<title>(.+?)</';
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
?>
The script works with some websites (like google, above), but when I try it with other websites (like, say, freshdirect), I get this error:
"Warning: file_get_contents(http://www.freshdirect.com) [function.file-get-contents]: failed to open stream: HTTP request failed!"
I've seen a bunch of great suggestions on StackOverflow, for example to enable extension=php_openssl.dll
in php.ini. But (1) my version of php.ini didn't have extension=php_openssl.dll
in it, and (2) when I added it to the extensions section and restarted the WAMP server, per this thread, still no success.
Would someone mind pointing me in the right direction? Thank you very much!
It just requires a user-agent ("any" really, any string suffices):
file_get_contents("http://www.freshdirect.com",false,stream_context_create(
array("http" => array("user_agent" => "any"))
));
See more options.
Of course, you can set user_agent
in your ini:
ini_set("user_agent","any");
echo file_get_contents("http://www.freshdirect.com");
... but I prefer to be explicit for the next programmer working on it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With