Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding and removing Non-ASCII characters from an Oracle Varchar2

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these record they fail as they contain characters that become multibyte UF8 characters. What I want to do within PL/SQL is locate these characters to see what they are and then either change them or remove them.

I would like to do :

SELECT REGEXP_REPLACE(COLUMN,'[^[:ascii:]],'') 

but Oracle does not implement the [:ascii:] character class.

Is there a simple way doing what I want to do?

like image 268
Paul Gilfedder Avatar asked Feb 10 '10 11:02

Paul Gilfedder


People also ask

How do I find non ASCII characters in SQL?

Alternatively, you can also use regular expressions to find non-ASCII characters. ASCII character set is captured using regex [A-Za-z0-9]. You can use this regex in your query as shown below, to find non-ASCII characters. mysql> SELECT * FROM data WHERE full_name NOT REGEXP '[A-Za-z0-9]';

How do I find and replace special characters in Oracle?

By using regexp_like and regexp_replace, how to find non printable characters from a string. would like to remove # and € and all special characters which are of €, which are not readable by the keyboard. select * from table1 where REGEXP_LIKE(column1, '[^A-Z^a-z^0-9^[^. ^{^}]' ,'x');

How do you remove spaces and special characters from a string in Oracle?

The Oracle REGEXP_REPLACE() function replaces a sequence of characters that matches a regular expression pattern with another string. The REGEXP_REPLACE() function is an advanced version of the REPLACE() function.


2 Answers

I think this will do the trick:

SELECT REGEXP_REPLACE(COLUMN, '[^[:print:]]', '') 
like image 138
Yuri Tkachenko Avatar answered Oct 01 '22 13:10

Yuri Tkachenko


If you use the ASCIISTR function to convert the Unicode to literals of the form \nnnn, you can then use REGEXP_REPLACE to strip those literals out, like so...

UPDATE table SET field = REGEXP_REPLACE(ASCIISTR(field), '\\[[:xdigit:]]{4}', '') 

...where field and table are your field and table names respectively.

like image 21
Robb Smith Avatar answered Oct 01 '22 11:10

Robb Smith