Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get rid of NUL characters in Oracle database?

The database: I have a Oracle11g database with a table that contains about 10 million rows and has about 40 columns. The data originates from punched card age and has been transformed from one Oracle version to the next several times. This is a live production database and in somewhat continuous use. But it's not so critical (especially during night time) that I could not strain it with expensive queries and updates. So that's ok.

edit: Charset is AL32UTF8.

The problem: I noticed that some columns contain NUL characters. I have found values that consist of 1-4 NUL characters, but there can be in theory as many of them than the number of chars available for that column. I want to get rid of the NUL characters. If there are only NUL characters in the value, I want to change the column's value to SQL NULL. If there are NUL characters between other characters (I haven't though seen any cases) I want to remove them (replace with "").

What I have tried:

I have noticed that select rawtohex(mycolumn) from mytable returns for example '000000' (3 NUL characters).

select rawtohex('A') from dual; returns '61'

select rawtohex('Ä') from dual; returns 'C385'.

I have explored finding NUL characters like this:

SELECT DISTINCT mycolumn 
FROM mytable
WHERE rawtohex(mycolumn) LIKE '%00%;'

So far I haven't been able to find any columns that would contain any characters that's rawtohex would contain '00', except columns where there are only NUL characters. So it looks like using LIKE '%00%' is safe. But I'm uncertain of oracle implementation of rawtohex and what kind of hex coding it uses.

So... While I continue exploring this path and eventually write a script to fix the garbage, I'm asking if anyone has encountered this job before and how did you solve it. :)

like image 510
Esko Piirainen Avatar asked Jun 13 '14 09:06

Esko Piirainen


People also ask

How are nulls treated in Oracle?

Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls. Any arithmetic expression containing a null always evaluates to null.

How replace Enter key with space in Oracle?

select translate(your_column, chr(10)||chr(11)||chr(13), ' ') from your_table; This replaces newline, tab and carriage return with space.

How do I cut a new line character in Oracle?

Hi: select replace(column_name,CHR(13),'') from table_name; If it didn't work, try CHR(10) intead of CHR(13).

Is null and empty string the same in Oracle?

The value null represents the absence of any object, while the empty string is an object of type String with zero characters. If you try to compare the two, they are not the same.

What is NVL command in Oracle?

The Oracle NVL() function allows you to replace null with a more meaningful alternative in the results of a query.


1 Answers

Personally I'd use CHR() to identify the nul values. A nul is an ASCII 0 and CHR() will return the character representation of the number you pass in.

SQL> with the_data as (
  2  select 'a' || chr(0) || 'b' as str from dual
  3   union all
  4  select 'a' || 'c' from dual
  5         )
  6  select dump(str)
  7    from the_data
  8   where str like '%' || chr(0) || '%'
  9         ;

DUMP(STR)
----------------------------------------------------    
Typ=1 Len=3: 97,0,98

As you can see by concatenating percentage signs around CHR(0) (which is equivalent to nul) you can return rows with the nul in.

DUMP() returns the data type (1 means VARCHAR2) the length of the string in bytes and the internal representation of the data; the default is binary.

However, you need to be careful with multibyte data as CHR() returns the character equivalent of the modulus of 256 of the number:

SQL> with the_data as (
  2  select 'a' || chr(0) || 'b' as str from dual
  3   union all
  4  select 'a' || chr(256) || 'c' from dual
  5         )
  6  select dump(str)
  7    from the_data
  8   where str like '%' || chr(0) || '%'
  9         ;

DUMP(STR)
-------------------------------------------------
Typ=1 Len=3: 97,0,98
Typ=1 Len=4: 97,1,0,99

As you can see you would mistakenly identify a nul here, using either CHR() or DUMP()

In other words, if you don't have multibyte data then the easiest thing do is simply replace it:

update <table>
   set <column> = replace(<column>, chr(0));

Utilising RAWTOHEX() has similar problems; though you can find the 00 there's no guarantee that it's actually a nul:

SQL> with the_data as (
  2  select 'a' || chr(0) || 'b' as str from dual
  3   union all
  4  select 'a' || chr(256) || 'c' from dual
  5         )
  6  select rawtohex(str)
  7    from the_data
  8   where str like '%' || chr(0) || '%'
  9         ;

RAWTOHEX
--------
610062
61010063

It actually has a further problem as well; imagine you had two characters 10 and 06 the returned value is then 1006 and you'll find 00. If you were to use this method you have to ensure that you only looked at two character groups, from the start of the string.

As the internal representation of a nul character is used to represent parts of other, multibyte characters, you can't just replace them as you don't know whether it's one character or half a character. So, if you're using a multibyte character set you are, as far as I know, not going to be able to do this.

like image 156
Ben Avatar answered Sep 20 '22 14:09

Ben