I am trying to parse http://www.desi-tashan.com/category/pakistan-tvs/aaj-tv/3-idiots/ with file_get_contents.
But it returns very unusual characters and symbols.
where as if I parse http://www.desi-tashan.com/ it works nicely. Could someone tell why is this happening?
Is there any encoding decoding involved?
The page seems to be made with wordpress..
the content you see is gzipped
you might be interested looking at gzdecode
or zlib-decode
(Please note that Zlib support in PHP is not enabled by default)
Your code might look like this
$url = 'http://www.desi-tashan.com/category/pakistan-tvs/aaj-tv/3-idiots/';
$content = file_get_contents($url);
$decoded_content = gzdecode($content); // or zlib_decode($content);
Another solution here on stackoverflow, which adds HTTP header Accept-Encoding
in the request telling the server NOT to gzip.
However, it doesn't work on www.desi-tashan.com
, the server is ignoring Accept-Encoding
header, and always return gzipped content
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With