The database: I have a Oracle11g database with a table that contains about 10 million rows and has about 40 columns. The data originates from punched card age and has been transformed from one Oracle version to the next several times. This is a live production database and in somewhat continuous use. But it's not so critical (especially during night time) that I could not strain it with expensive queries and updates. So that's ok.
edit: Charset is AL32UTF8.
The problem: I noticed that some columns contain NUL characters. I have found values that consist of 1-4 NUL characters, but there can be in theory as many of them than the number of chars available for that column. I want to get rid of the NUL characters. If there are only NUL characters in the value, I want to change the column's value to SQL NULL. If there are NUL characters between other characters (I haven't though seen any cases) I want to remove them (replace with "").
What I have tried:
I have noticed that select rawtohex(mycolumn) from mytable
returns for example '000000' (3 NUL characters).
select rawtohex('A') from dual;
returns '61'
select rawtohex('Ä') from dual;
returns 'C385'.
I have explored finding NUL characters like this:
SELECT DISTINCT mycolumn
FROM mytable
WHERE rawtohex(mycolumn) LIKE '%00%;'
So far I haven't been able to find any columns that would contain any characters that's rawtohex would contain '00', except columns where there are only NUL characters. So it looks like using LIKE '%00%'
is safe. But I'm uncertain of oracle implementation of rawtohex and what kind of hex coding it uses.
So... While I continue exploring this path and eventually write a script to fix the garbage, I'm asking if anyone has encountered this job before and how did you solve it. :)
Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls. Any arithmetic expression containing a null always evaluates to null.
select translate(your_column, chr(10)||chr(11)||chr(13), ' ') from your_table; This replaces newline, tab and carriage return with space.
Hi: select replace(column_name,CHR(13),'') from table_name; If it didn't work, try CHR(10) intead of CHR(13).
The value null represents the absence of any object, while the empty string is an object of type String with zero characters. If you try to compare the two, they are not the same.
The Oracle NVL() function allows you to replace null with a more meaningful alternative in the results of a query.
Personally I'd use CHR()
to identify the nul values. A nul is an ASCII 0 and CHR()
will return the character representation of the number you pass in.
SQL> with the_data as (
2 select 'a' || chr(0) || 'b' as str from dual
3 union all
4 select 'a' || 'c' from dual
5 )
6 select dump(str)
7 from the_data
8 where str like '%' || chr(0) || '%'
9 ;
DUMP(STR)
----------------------------------------------------
Typ=1 Len=3: 97,0,98
As you can see by concatenating percentage signs around CHR(0)
(which is equivalent to nul) you can return rows with the nul in.
DUMP()
returns the data type (1 means VARCHAR2) the length of the string in bytes and the internal representation of the data; the default is binary.
However, you need to be careful with multibyte data as CHR()
returns the character equivalent of the modulus of 256 of the number:
SQL> with the_data as (
2 select 'a' || chr(0) || 'b' as str from dual
3 union all
4 select 'a' || chr(256) || 'c' from dual
5 )
6 select dump(str)
7 from the_data
8 where str like '%' || chr(0) || '%'
9 ;
DUMP(STR)
-------------------------------------------------
Typ=1 Len=3: 97,0,98
Typ=1 Len=4: 97,1,0,99
As you can see you would mistakenly identify a nul here, using either CHR()
or DUMP()
In other words, if you don't have multibyte data then the easiest thing do is simply replace it:
update <table>
set <column> = replace(<column>, chr(0));
Utilising RAWTOHEX()
has similar problems; though you can find the 00
there's no guarantee that it's actually a nul:
SQL> with the_data as (
2 select 'a' || chr(0) || 'b' as str from dual
3 union all
4 select 'a' || chr(256) || 'c' from dual
5 )
6 select rawtohex(str)
7 from the_data
8 where str like '%' || chr(0) || '%'
9 ;
RAWTOHEX
--------
610062
61010063
It actually has a further problem as well; imagine you had two characters 10
and 06
the returned value is then 1006
and you'll find 00
. If you were to use this method you have to ensure that you only looked at two character groups, from the start of the string.
As the internal representation of a nul character is used to represent parts of other, multibyte characters, you can't just replace them as you don't know whether it's one character or half a character. So, if you're using a multibyte character set you are, as far as I know, not going to be able to do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With