Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode "Content-Encoding: gzip, gzip" using curl?

I am trying to decode the webpage www.dealstan.com using CURL by using the below code:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");     
curl_setopt($ch, CURLOPT_TIMEOUT,5); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 

$html = str_get_html("$return");
echo $html;

but, it is showing some junk charater

"��}{w�6����9�X�n���.........." for about 100 lines.

I tried to find the response in hurl.it, found one interesting point, it looks like the html is encoded twice(just a guess, based on the response)

Find the response below: GET http://www.dealstan.com/

200 OK 18.87 kB 490 ms View Request View Response HEADERS

Cache-Control: max-age=0, no-cache

Cf-Ray: 18be7f54f8d80f1b-IAD

Connection: keep-alive

Content-Encoding: gzip, gzip ==============>? suspecting this, anyone know about it?

Content-Type: text/html; charset=UTF-8

Date: Wed, 19 Nov 2014 18:33:39 GMT

Server: cloudflare-nginx

Set-Cookie: __cfduid=d1cff1e3134c5f32d2bddc10207bae0681416422019; expires=Thu, 19-Nov-15 18:33:39 GMT; path=/; domain=.dealstan.com; HttpOnly

Transfer-Encoding: chunked

Vary: Accept-Encoding

X-Page-Speed: 1.8.31.2-3973

X-Pingback: http://www.dealstan.com/xmlrpc.php

X-Powered-By: HHVM/3.2.0 BODY view raw

H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu/u2bdu2bdu2bdu2bds2583f/pjFVOQqozZnUxkVJ7PwoyAA/qeAb3y83LbYHs/3Hv79wKm/2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp/1Fe7es3+wL3Vfb

anyone knows how to decode the response with the header "Content-Encoding: gzip, gzip",

That site is loading properly in firefox, chrome etc. but, i am not able to decode using CURL.

Please help me to decode this issue?

like image 981
stackguy Avatar asked Nov 19 '14 19:11

stackguy


People also ask

Should you enable GZIP encoding on your server?

Last week I detailed how I enabled gzip encoding on nginx servers, the same server software I use on this site. Enabling gzip on your server exponentially improves the site load time, thus improving user experience and (hopefully) Google page ranks.

How do I decompress a gzip file using SED?

Delete the headers and what you'll have left is gzip -compressed data that can be decompressed with gzip -d or zcat . e.g. The sed script deletes the headers - i.e. everything from the first line to the first empty line ( /^ [ [:space:]]*$/ ).

How can I tell if a page has been gzipped?

From the above response, you can see that the page was served gzipped via the Content-Encoding: gzip header. You can check individual files instead of pages to ensure they have been gzipped as well.


1 Answers

You can decode it by trimming off the headers and using gzinflate.

$url = "http://www.dealstan.com"

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING, "gzip");     
curl_setopt($ch, CURLOPT_TIMEOUT, 5); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 

$return = gzinflate(substr($return, 10));
print_r($return);
like image 118
Nalin Singapuri Avatar answered Oct 22 '22 01:10

Nalin Singapuri