I am using PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/ to fetch data like Page Title, Meta Description and Meta Tags from other domains and then insert it into database.
But I have some issues with encoding. The problem is that I do not get correct characters from those website which is not in English Language.
Below is the code:
<?php
require 'init.php';
$curl = new curl();
$html = new simple_html_dom();
$page = $_GET['page'];
$curl_output = $curl->getPage($page);
$html->load($curl_output['content']);
$meta_title = $html->find('title', 0)->innertext;
print $meta_title . "<hr />";
// print $html->plaintext . "<hr />";
?>
Output for facebook.compage
Welcome to Facebook — Log in, sign up or learn more
Output for amazon.cnpage
亚马逊-网上è´ç‰©å•†åŸŽï¼šè¦ç½‘è´, å°±æ¥Z.cn!
Output for mail.rupage
Mail.Ru: почта, поиÑк в интернете, новоÑти, игры, развлечениÑ
So, the characters is not being encoded properly.
Can anyone help me how to solve this issue so that I can add correct data into my database.
@deceze and @Shakti thanks for your help.
+1 for the article link posted by deceze (Handling Unicode Front to Back in a Web App) and it also worth reading Understanding encoding
After reading your comments, answer and of course those two articles, I finally solved my issue.
I have listed the steps I did so far to solve this issue:
header('Content-Type: text/html; charset=utf-8'); on the top of my init.php file,mysql_set_charset('utf8', $connection_link_id);$meta_title = htmlentities(trim($meta_title_raw), ENT_QUOTES, 'UTF-8');Now the issue seems to be solved, BUT I still have to do following thing to solve this issue in FULL.
$source_charset.iconv(). Example: iconv($source_charset, "UTF-8", $meta_title_raw);For getting $source_charset I probably have to use some tricks or multi checking. Like checking headers and meta tag etc. I found a good answer at Detect encoding
Let me know if there are any improvements or any fault on my steps above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With