Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange Characters in database text: Ã, Ã, ¢, â‚ €,

I'm not certain when this first occured.

I have a new drop-shipping affiliate website, and receive an exported copy of the product catalog from the wholesaler. I format and import this into Prestashop 1.4.4.

The front end of the website contains combinations of strange characters inside product text: Ã, Ã, ¢, â‚ etc. They appear in place of common characters like , - : etc.

These characters are present in about 40% of the database tables, not just product specific tables like ps_product_lang.

Another website thread says this same problem occurs when the database connection string uses an incorrect character encoding type.

In /config/setting.inc, there is no character encoding string mentioned, just the MySQL Engine, which is set to InnoDB, which matches what I see in PHPMyAdmin.

I exported ps_product_lang, replaced all instances of these characters with correct characters, saved the CSV file in UTF-8 format, and reimported them using PHPMyAdmin, specifying UTF-8 as the language.

However, after doing a new search in PHPMyAdmin, I now have about 10 times as many instances of these bad characters in ps_product_lang than I started with.

If the problem is as simple as specifying the correct language attribute in the database connection string, where/how do I set this, and what to?

Incidently, I tried running this command in PHPMyAdmin mentioned in this thread, but the problem remains:

SET NAMES utf8 

UPDATE: PHPMyAdmin says:

MySQL charset: UTF-8 Unicode (utf8)

This is the same character set I used in the last import file, which caused more character corruptions. UTF-8 was specified as the charset of the import file during the import process.

UPDATE2

Here is a sample:

people are truly living untetheredÃƒÆ’Ã‚Â¢ÃƒÂ¢Ã¢â‚¬Å¡Ã‚Â¬ÃƒÂ¯Ã¢â‚¬Â Ã‚ï† buying and renting movies online, downloading software, and sharing and storing files on the web.

UPDATE3

I ran an SQL command in PHPMyAdmin to display the character sets:

  • character_set_client utf8
  • character_set_connection utf8
  • character_set_database latin1
  • character_set_filesystem binary
  • character_set_results utf8
  • character_set_server latin1
  • character_set_system utf8

So, perhaps my database needs to be converted (or deleted and recreated) to UTF-8. Could this pose a problem if the MySQL server is latin1?

Can MySQL handle the translation of serving content as UTF8 but storing it as latin1? I don't think it can, as UTF8 is a superset of latin1. My web hosting support has not replied in 48 hours. Might be too hard for them.

like image 897
Steve Avatar asked Oct 22 '11 09:10

Steve


People also ask

What are these characters  €?

It is a character encoding issue. Whom ever is sending the mail is using a character set that is not appropriate. View menu (Alt+V) > character encoding and select UTF-8 or unicode should see the correct display.

How do you handle special characters in SQL?

Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence. When you use braces to escape a single character, the escaped character becomes a separate token in the query. Use the backslash character to escape a single character or symbol.

Why does É become Â?

This typically) happens when you're not decoding the text in the right encoding format (probably UTF-8).


2 Answers

If the charset of the tables is the same as it's content try to use mysql_set_charset('UTF8', $link_identifier). Note that MySQL uses UTF8 to specify the UTF-8 encoding instead of UTF-8 which is more common.

Check my other answer on a similar question too.

like image 175
AlexV Avatar answered Sep 23 '22 01:09

AlexV


This is surely an encoding problem. You have a different encoding in your database and in your website and this fact is the cause of the problem. Also if you ran that command you have to change the records that are already in your tables to convert those character in UTF-8.

Update: Based on your last comment, the core of the problem is that you have a database and a data source (the CSV file) which use different encoding. Hence you can convert your database in UTF-8 or, at least, when you get the data that are in the CSV, you have to convert them from UTF-8 to latin1.

You can do the convertion following this articles:

  • Convert latin1 to UTF8
  • http://wordpress.org/support/topic/convert-latin1-to-utf-8
like image 27
Aurelio De Rosa Avatar answered Sep 23 '22 01:09

Aurelio De Rosa