Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres error on insert - ERROR: invalid byte sequence for encoding "UTF8": 0x00

Tags:

postgresql

I get the following error when inserting data from mysql into postgres.

Do I have to manually remove all null characters from my input data? Is there a way to get postgres to do this for me?

ERROR: invalid byte sequence for encoding "UTF8": 0x00
like image 631
ScArcher2 Avatar asked Aug 28 '09 15:08

ScArcher2


4 Answers

PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.

Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.

like image 155
Magnus Hagander Avatar answered Oct 18 '22 18:10

Magnus Hagander


Just regex out null bytes:

s/\x00//g;
like image 27
hicham Avatar answered Oct 18 '22 16:10

hicham


If you are using Java, you could just replace the x00 characters before the insert like following:

myValue.replaceAll("\u0000", "")

The solution was provided and explained by Csaba in following post:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

in Java you can actually have a "0x0" character in your string, and that's valid unicode. So that's translated to the character 0x0 in UTF8, which in turn is not accepted because the server uses null terminated strings... so the only way is to make sure your strings don't contain the character '\u0000'.

like image 26
David Dal Busco Avatar answered Oct 18 '22 18:10

David Dal Busco


Only this regex worked for me:

sed 's/\\0//g'

So as you get your data do this: $ get_data | sed 's/\\0//g' which will output your data without 0x00

like image 3
techkuz Avatar answered Oct 18 '22 16:10

techkuz