I am scraping a list of RSS feeds by using cURL, and then I am reading and parsing the RSS data with SimpleXML. The sorted data is then inserted into a mySQL database.
However, as notice on http://dansays.co.uk/research/MNA/rss.php I am having several issues with characters not displaying correctly.
Examples:
âGuitar Hero: Van Halenâ Trailer And Tracklist Available
NV 10/10/09 – Salt Lake City, UT 10/11/09 – Denver, CO 10/13/09 –
I have tried using htmlentities and htmlspecialchars on the data before inserting them into the database, but it doesn't seem to help resolve issue.
How could I possibly resolve this issue I am having?
Thanks for any advices.
Updated
I've tried what Greg suggested, and the issue is still here...
Here is the code I used to do SET NAMES in PDO:
$dbh = new PDO($dbstring, $username, $password);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$dbh->query('SET NAMES "utf8"');
I did a bit of echo'ing with the simplexml data before it is sorted and inserted into the database, and I now believe it is something to do with the cURL...
Here is what I have for cURL:
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');
$data = curl_exec($ch);
curl_close($ch);
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);
Issue Resolved
I had to set the content charset in the RSS/HTML page to "UTF-8" to resolve this issue. I guess this isn't a real fix as the char problems are still there in the raw data. Looking forward to proper support for it in PHP6!
Your page is being served as UTF-8 so I'd point my finger at the database.
Make sure the connection is in UTF-8 before any SELECTs or INSERTS - in MySQL:
SET NAMES "utf8"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With