I am trying to decode the webpage www.dealstan.com using CURL by using the below code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$html = str_get_html("$return");
echo $html;
but, it is showing some junk charater
"��}{w�6����9�X�n���.........." for about 100 lines.
I tried to find the response in hurl.it, found one interesting point, it looks like the html is encoded twice(just a guess, based on the response)
Find the response below: GET http://www.dealstan.com/
200 OK 18.87 kB 490 ms View Request View Response HEADERS
Cache-Control: max-age=0, no-cache
Cf-Ray: 18be7f54f8d80f1b-IAD
Connection: keep-alive
Content-Encoding: gzip, gzip ==============>? suspecting this, anyone know about it?
Content-Type: text/html; charset=UTF-8
Date: Wed, 19 Nov 2014 18:33:39 GMT
Server: cloudflare-nginx
Set-Cookie: __cfduid=d1cff1e3134c5f32d2bddc10207bae0681416422019; expires=Thu, 19-Nov-15 18:33:39 GMT; path=/; domain=.dealstan.com; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Page-Speed: 1.8.31.2-3973
X-Pingback: http://www.dealstan.com/xmlrpc.php
X-Powered-By: HHVM/3.2.0 BODY view raw
H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu/u2bdu2bdu2bdu2bds2583f/pjFVOQqozZnUxkVJ7PwoyAA/qeAb3y83LbYHs/3Hv79wKm/2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp/1Fe7es3+wL3Vfb
anyone knows how to decode the response with the header "Content-Encoding: gzip, gzip",
That site is loading properly in firefox, chrome etc. but, i am not able to decode using CURL.
Please help me to decode this issue?
Last week I detailed how I enabled gzip encoding on nginx servers, the same server software I use on this site. Enabling gzip on your server exponentially improves the site load time, thus improving user experience and (hopefully) Google page ranks.
Delete the headers and what you'll have left is gzip -compressed data that can be decompressed with gzip -d or zcat . e.g. The sed script deletes the headers - i.e. everything from the first line to the first empty line ( /^ [ [:space:]]*$/ ).
From the above response, you can see that the page was served gzipped via the Content-Encoding: gzip header. You can check individual files instead of pages to ensure they have been gzipped as well.
You can decode it by trimming off the headers and using gzinflate.
$url = "http://www.dealstan.com"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$return = gzinflate(substr($return, 10));
print_r($return);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With