Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing strange characters from MySQL data

Tags:

mysql

Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.

When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.

Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?

like image 306
Andy Soell Avatar asked May 09 '12 19:05

Andy Soell


People also ask

How do I remove special characters from a MySQL query?

You can remove special characters from a database field using REPLACE() function.

How do I remove a junk character in SQL?

If you want to detect hidden or unwanted characters as part of an initial diagnosis, use LENGTH . Then, use TRIM to get rid of unwanted characters. We also discussed how you can nest SQL functions, which is a powerful technique in programming.

How do I remove non ascii characters in MySQL?

UPDATE tablename SET columnToCheck = REPLACE(CONVERT(columnToCheck USING ascii), '? ', '') WHERE ... If the database is encoded UTF8 (mysql default), the first code will delete all characters with accents, which could be undesirable.

What characters need to be escaped MySQL?

Backslash ( \ ) and the quote character used to quote the string must be escaped. In certain client environments, it may also be necessary to escape NUL or Control+Z.


2 Answers

This is an encoding problem; the  is a non-breaking space (HTML entity  ) in Unicode being displayed in Latin1.

You might try something like this... first we check to make sure the matching is working:

SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'

This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:

UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )

And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).

If it doesn't work then you'll need to look at the encoding and collation being used.

EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.

like image 200
kitti Avatar answered Sep 23 '22 19:09

kitti


The accepted answer did not work for me.

From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to). Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:

update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;

The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.

like image 40
user109764 Avatar answered Sep 24 '22 19:09

user109764