I have a database with several thousand records, and I need to strip down one of the fields to ensure that it only contains certain characters (Alphanumeric, spaces, and single quotes). What SQL can I use to strip any other characters (such as slashes, etc) from that field in the whole database?
You can remove special characters from a database field using REPLACE() function.
Remove characters from string using TRIM() This section will remove the characters from the string using the TRIM() function of MySQL. TRIM() function is used to remove any character/ whitespace from the start/ end or both from a string. Let us move ahead by looking into its syntax and application. Name of the table.
update mytable
set FieldName = REPLACE(FieldName,'/','')
That's a good place to start.
I have created simple function for this
DROP FUNCTION IF EXISTS `regex_replace`$$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) CHARSET utf8mb4
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END
Usage example:
SELECT <field-name> AS NormalText, regex_replace('[^A-Za-z0-9 ]', '', <field-name>)AS RegexText FROM
<table-name>
The Replace() function is first choice. However, Special Characters can sometimes be tricky to write in a console. For those you can combine Replace with the Char() function.
e.g. removing €
Update products set description = replace(description, char(128), '');
You can find all the Ascii values here
Ideally you could do a regex to find all the special chars, but apparently that's not possible with MySQL.
Beyond that, you'd need to run it through your favorite scripting language.
This might be useful.
This solution doesn't involves creating procedures or functions or lengthy use of replace within replace. Instead we know that all the ASCII characters that doesn't involves special character lies within ASCII codes \x20-\x7E (Hex representation). Source ASCII From Wikipedia, the free encyclopedia Below are all those characters in that interval.
Hex: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E
Glyph: space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ↑ ← @ a b c d e f g h i j k l m n o p q r s t u v w x y z { ACK } ESC
so as simple regular expression replace will do the job
SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName;
PHP Custom query string
$query = "select REGEXP_REPLACE(columnName, '(.*)[(].*[)](.*)', CONCAT('\\\\1', '\\\\2')) `Alias` FROM table_Name";
The above statement replaces the content in between brackets as well as brackets. e.g. if the column contains 'Staff Oreintation (CMST TOT)' then above statement will removes the brackets and its contant i.e. 'Staff Oreintation'.
PS: If you are doing any DML (select, update ...) operation using prepare statement in stored procedure OR through PHP (creating a custom query string); then remember to escape the slash i.e.
SET @sql = CONCAT("SELECT REGEXP_REPLACE(columnName, '[^\\\\x20-\\\\x7E]', '') from tableName");
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
The above SQL statement does a simple regular expression replaces (actually removes) of all the special character; i.e. In the SQL a REGEX pattern is mention of all the special characters to be replaced with nothing.
Explanation of the pattern
A character group is started with the square bracket. The first character is caret which means; negation of all the characters mention in the group (i.e. with in the squares brackets). This simply means select compliment (other character than those selected) of all the characters in the group.
Just to summarize the above statement will
FYI: Remember Enter (line feed \n 0A, Carriage Return \r 0D), Tab (Horizontal Tab \t 09, Vertical Tab \v 0B) are not printable characters but are sometimes significant; So if you want them excluded as well add them also. i.e.
[^\x20-\x7E\x0A\x0D\x09\x0B]
Unchanged: all the alphanumeric characters, punctuation characters, arithmetic operators.
Remove all the Unicode characters (other than Latin alphabets) or special characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With