Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Special characters won't work in MySQL (UTF-8)

So, I've had some issues while trying to come over from Latin1 encoded databases, tables as well as columns, and now that everything is finally in UTF-8, I can't seem to update a row in a column. I am trying to replace an "e" with an e with acute (é). But it gives me this:

ERROR 1366 (HY000): Incorrect string value: '\x82m ...' for column 'Name' at row 1

when running this:

UPDATE access SET Name='ém' WHERE id="2";

All databases gives me this when running the status command (except the current database part of course):


Connection id:          1  
Current database:       access  
Current user:           root@localhost  
SSL:                    Not in use  
Using delimiter:        ;  
Server version:         5.1.47-community MySQL Community Server (GPL)  
Protocol version:       10  
Connection:             localhost via TCP/IP  
Server characterset:    utf8  
Db     characterset:    utf8  
Client characterset:    utf8  
Conn.  characterset:    utf8  
TCP port:               3306  
Uptime:                 20 min 16 sec  

Threads: 1 Questions: 110 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 11  Queries per second avg: 0.90

And running the chcp command in cmd gives me 850. Oh, and at some points I got this:

ERROR 1300 (HY000): Invalid utf8 character string: 'ém' WHERE id="2"

I've looked everywhere for a solution, but I couldn't seem to find anything anywhere, and since I've always had good responses on Stackoverflow, I thought I'd ask here.

Thanks for any help!

like image 697
Nisto Avatar asked Feb 27 '23 15:02

Nisto


2 Answers

The solution is to set the connection variables to whatever codepage your installation of windows uses (not latin1 like what a lot of pages out there recommend - cmd.exe's character encoding isn't latin1).

In my case the codepage is 850:

mysql> SET NAMES cp850;

Here's an example with the connection set to UTF-8:

mysql> show variables like '%char%';
+--------------------------+---------------------------------+
| Variable_name            | Value                           |
+--------------------------+---------------------------------+
| character_set_client     | utf8                            |
| character_set_connection | utf8                            |
| character_set_database   | utf8                            |
| character_set_filesystem | binary                          |
| character_set_results    | utf8                            |
| character_set_server     | utf8                            |
| character_set_system     | utf8                            |
| character_sets_dir       | C:\xampp\mysql\share\charsets\  |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)

This is what happens to accented characters:

mysql> select nom from assignatura where nom like '%prob%';
+---------------------------------------+
| nom                                   |
+---------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+---------------------------------------+
2 rows in set (0.03 sec)

Notice the extraneous character just before the á. Also the accent is the wrong direction, it should be à.

After executing SET NAMES cp850;:

mysql> show variables like '%char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | cp850                          |
| character_set_connection | cp850                          |
| character_set_database   | utf8                           |
| character_set_filesystem | binary                         |
| character_set_results    | cp850                          |
| character_set_server     | utf8                           |
| character_set_system     | utf8                           |
| character_sets_dir       | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)

We finally get the correct accented character:

mysql> select nom from assignatura where nom like '%prob%';
+--------------------------------------+
| nom                                  |
+--------------------------------------+
| Probabilitat i Processos Estocàstics |
| Probabilitat i Processos Estocàstics |
+--------------------------------------+
2 rows in set (0.00 sec)
like image 92
bob esponja Avatar answered Mar 04 '23 23:03

bob esponja


This thread, although somewhat old, seems to result in the conclusion that cmd.exe and the mysql client don't handle UTF-8 encoding properly (with the blame being more aimed at cmd.exe).

Reading in SQL from a file is recommended, as is using an alternative client - or a flavour of UNIX. :)

like image 28
Will A Avatar answered Mar 04 '23 23:03

Will A