Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove special characters from a database field

I have a database with several thousand records, and I need to strip down one of the fields to ensure that it only contains certain characters (Alphanumeric, spaces, and single quotes). What SQL can I use to strip any other characters (such as slashes, etc) from that field in the whole database?

like image 332
MarathonStudios Avatar asked May 20 '11 02:05

MarathonStudios


People also ask

How do I remove special characters from a database?

You can remove special characters from a database field using REPLACE() function.

How remove special characters from a string in MySQL select query?

Remove characters from string using TRIM() This section will remove the characters from the string using the TRIM() function of MySQL. TRIM() function is used to remove any character/ whitespace from the start/ end or both from a string. Let us move ahead by looking into its syntax and application. Name of the table.


4 Answers

update mytable
set FieldName = REPLACE(FieldName,'/','')

That's a good place to start.

like image 133
Vinnie Avatar answered Oct 08 '22 01:10

Vinnie


I have created simple function for this

DROP FUNCTION IF EXISTS `regex_replace`$$

CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) CHARSET utf8mb4
    DETERMINISTIC
BEGIN    
    DECLARE temp VARCHAR(1000); 
    DECLARE ch VARCHAR(1); 
    DECLARE i INT;
    SET i = 1;
    SET temp = '';
    IF original REGEXP pattern THEN 
        loop_label: LOOP 
            IF i>CHAR_LENGTH(original) THEN
                LEAVE loop_label;  
            END IF;

            SET ch = SUBSTRING(original,i,1);

            IF NOT ch REGEXP pattern THEN
                SET temp = CONCAT(temp,ch);
            ELSE
                SET temp = CONCAT(temp,replacement);
            END IF;

            SET i=i+1;
        END LOOP;
    ELSE
        SET temp = original;
    END IF;

    RETURN temp;
END

Usage example:

SELECT <field-name> AS NormalText, regex_replace('[^A-Za-z0-9 ]', '', <field-name>)AS RegexText FROM 
<table-name>
like image 45
Juned Ansari Avatar answered Oct 08 '22 01:10

Juned Ansari


The Replace() function is first choice. However, Special Characters can sometimes be tricky to write in a console. For those you can combine Replace with the Char() function.

e.g. removing €

Update products set description = replace(description, char(128), '');

You can find all the Ascii values here

Ideally you could do a regex to find all the special chars, but apparently that's not possible with MySQL.

Beyond that, you'd need to run it through your favorite scripting language.

like image 26
DonaldSowell Avatar answered Oct 08 '22 01:10

DonaldSowell


This might be useful.

This solution doesn't involves creating procedures or functions or lengthy use of replace within replace. Instead we know that all the ASCII characters that doesn't involves special character lies within ASCII codes \x20-\x7E (Hex representation). Source ASCII From Wikipedia, the free encyclopedia Below are all those characters in that interval.

Hex: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E
Glyph:  space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ↑ ← @ a b c d e f g h i j k l m n o p q r s t u v w x y z { ACK } ESC

so as simple regular expression replace will do the job

SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName;

PHP Custom query string

$query = "select REGEXP_REPLACE(columnName, '(.*)[(].*[)](.*)', CONCAT('\\\\1', '\\\\2')) `Alias` FROM table_Name";

The above statement replaces the content in between brackets as well as brackets. e.g. if the column contains 'Staff Oreintation (CMST TOT)' then above statement will removes the brackets and its contant i.e. 'Staff Oreintation'.

PS: If you are doing any DML (select, update ...) operation using prepare statement in stored procedure OR through PHP (creating a custom query string); then remember to escape the slash i.e.

SET @sql = CONCAT("SELECT REGEXP_REPLACE(columnName, '[^\\\\x20-\\\\x7E]', '') from tableName");
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

The above SQL statement does a simple regular expression replaces (actually removes) of all the special character; i.e. In the SQL a REGEX pattern is mention of all the special characters to be replaced with nothing.

Explanation of the pattern

A character group is started with the square bracket. The first character is caret which means; negation of all the characters mention in the group (i.e. with in the squares brackets). This simply means select compliment (other character than those selected) of all the characters in the group.

Just to summarize the above statement will

FYI: Remember Enter (line feed \n 0A, Carriage Return \r 0D), Tab (Horizontal Tab \t 09, Vertical Tab \v 0B) are not printable characters but are sometimes significant; So if you want them excluded as well add them also. i.e.

[^\x20-\x7E\x0A\x0D\x09\x0B]

Unchanged: all the alphanumeric characters, punctuation characters, arithmetic operators.

Remove all the Unicode characters (other than Latin alphabets) or special characters.

like image 45
Adeel Raza Azeemi Avatar answered Oct 08 '22 01:10

Adeel Raza Azeemi