I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem. My php file encoding is utf-8 and all other php files work ok (that means there is no problem with server). What is the missing thing in that code and all spanish letters look weird. PS. When I wrote these weird characters original versions into php, they all look accurate.
header("Content-Type: text/html; charset=utf-8");
function file_get_contents_curl($url)
{
$ch=curl_init();
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
$data=curl_exec($ch);
curl_close($ch);
return $data;
}
$html=file_get_contents_curl($_GET["u"]);
$doc=new DOMDocument();
@$doc->loadHTML($html);
Simple:
When you use curl it encodes the string to utf-8
you just need to decode them..
Description
string utf8_decode ( string $data )
This function decodes data , assumed to be UTF-8
encoded, to ISO-8859-1
.
You Can use this header
header('Content-type: text/html; charset=UTF-8');
and after decoding the string
$page = utf8_decode(curl_exec($ch));
It worked for me
$output = curl_exec($ch);
$result = iconv("Windows-1251", "UTF-8", $output);
function page_title($val){
include(dirname(__FILE__).'/simple_html_dom.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$val);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$return = curl_exec($ch);
$encot = false;
$charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
curl_close($ch);
$html = str_get_html('"'.$return.'"');
if(strpos($charset,'charset=') !== false) {
$c = str_replace("text/html; charset=","",$charset);
$encot = true;
}
else {
$lookat=$html->find('meta[http-equiv=Content-Type]',0);
$chrst = $lookat->content;
preg_match('/charset=(.+)/', $chrst, $found);
$p = trim($found[1]);
if(!empty($p) && $p != "")
{
$c = $p;
$encot = true;
}
}
$title = $html->find('title')[0]->innertext;
if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);
return $title;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With