function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/25.0.1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 'long cookie here');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
The original url I'm feeding it is http://example.com/i-123.html but if I open in browser, I get redirected to https://example.com/item-description-123.html (so I added CURLOPT_FOLLOWLOCATION
).
However, the output of this function is binary data.
1f8b 0800 0000 0000 0003 ed7d e976 db38
f2ef e7f8 2930 9ac9 d86e 9b92 b868 f3a2
3e5e 9374 67fb c7ee 74f7 e4e6 f880 2428
31a6 4835 172f 3dd3 8f74 3fde 17b8 f7c5
6e15 008a 8ba8 2db1 3ce9 25a7 dba4 4810
......
How do I fix this? I tried adding
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
(copied from somewhere). Didn't work.
file_get_contents()
gives me the same output.
De-compressing curl output Binary output may be the result of HTTP compression which is often used to save bandwidth and speed-up transmission.
--data-binary is a curl SPECIFIC flag for curl itself. it has nothing to do with HTTP web services call specifically, but it's how you "POST" data to the call in the HTTP BODY instead of in the header WHEN using curl.
Well, the solution was pathetic...
Using wget -S http://example.com
I found out that the content is compressed (gzipped). Using gunzip
I successfully extracted the html.
Also added to my original PHP script
curl_setopt($ch,CURLOPT_ENCODING , "");
And it worked like a charm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With