Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why file_get_contents returning strange characters?

Tags:

php

parsing

I am trying to parse http://www.desi-tashan.com/category/pakistan-tvs/aaj-tv/3-idiots/ with file_get_contents.

But it returns very unusual characters and symbols.

where as if I parse http://www.desi-tashan.com/ it works nicely. Could someone tell why is this happening?

Is there any encoding decoding involved?

The page seems to be made with wordpress..

like image 290
Abul Hasnat Avatar asked Dec 21 '22 15:12

Abul Hasnat


1 Answers

the content you see is gzipped

you might be interested looking at gzdecode or zlib-decode (Please note that Zlib support in PHP is not enabled by default)

Your code might look like this

$url = 'http://www.desi-tashan.com/category/pakistan-tvs/aaj-tv/3-idiots/';
$content = file_get_contents($url);
$decoded_content = gzdecode($content); // or zlib_decode($content);

Another solution here on stackoverflow, which adds HTTP header Accept-Encoding in the request telling the server NOT to gzip.

However, it doesn't work on www.desi-tashan.com, the server is ignoring Accept-Encoding header, and always return gzipped content

like image 198
Neverever Avatar answered Jan 03 '23 05:01

Neverever