Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting utf8 broken characters in MySQL

Tags:

mysql

utf-8

I've got a database with a bunch of broken utf8 characters scattered across several tables. The list of characters isn't very extensive AFAIK (áéíúóÁÉÍÓÚÑñ)

Fixing a given table is very straightforward

update orderItem set itemName=replace(itemName,'á','á'); 

But I can't get a way of detecting the broken characters. If I do something like

SELECT * FROM TABLE WHERE field LIKE "%Ã%"; 

I get nearly all the fields because of the collation (Ã=a). All broken characters so far start with an "Ã". The database is in spanish so this particular character isn't used

The list of broken chars I've got so far is

á = á é = é í- = í ó = ó ñ = ñ á = Á 

Any idea of how to make this SELECT to work as intended? (a binary search or something like that)

like image 722
The Disintegrator Avatar asked Sep 25 '09 09:09

The Disintegrator


People also ask

How do I know if a character is UTF-8?

You do that by calling str. valid_encoding? on a String str that is in UTF-8 -encoding. Does that not get clear from my answer? Programmatically, you can not (or at least not easily and of course not reliably) check the invalidity of a string in a one-byte-encoding such as CP1252 .

What characters are not in UTF-8?

0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits. If by char you mean an 8-bit byte, then the invalid UTF-8 code units would be char values that do not appear in UTF-8 encoded text.


2 Answers

I fixed with

UPDATE wp_zcs9ck_posts_copy SET post_title =      CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8); 

Complete solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/

like image 66
Thales Ceolin Avatar answered Oct 05 '22 23:10

Thales Ceolin


UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\''); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\''); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c'); UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â',''); 
like image 22
Raúl Avila Solano Avatar answered Oct 06 '22 01:10

Raúl Avila Solano