Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql COPY with text value containing \0 (backslash 0)

Setup: Postgresql Server 9.3 - OS: CentOS 6.6

Attempting to bulk insert 250 million records into a Postgresql 9.3 server using the COPY command. The data is in delimited format using a pipe '|' as the delimiter.

Almost all columns in the table that I'm copying to are TEXT datatypes. Unfortunately, out of the 250 million records, there's about 2 million that have legitimate textual values with a "\0" in the text.

Example entry:

245150963|DATASOURCE|736778|XYZNR-1B5.1|10-DEC-1984 00:00:00|||XYZNR-1B5.1\1984-12-10\0.5\1\ASDF1|pH|Physical|Water|XYZNR|Estuary

As you can see, the 8th column has a legitimate \0 in its value.

XYZNR-1B5.1\1984-12-10\0.5\1\ASDF1

No matter how I escape this, the COPY command will either convert this \0 into an actual "\x0" or the COPY command fails with "ERROR: invalid byte sequence for encoding "UTF8": 0x00".

I have tried replacing the \0 with "sed -i" with:

\\0
\\\0
'\0'
\'\'0
\\\\\0

... and many others I can't remember and none of them work.

What would be the correct escaping of these types of strings?

Thanks!

like image 804
PAR Avatar asked Jun 08 '26 23:06

PAR


1 Answers

Per Postgres doc on COPY:

Backslash characters () can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character.

Try to convert all your backslash characters in that path in the field to \\, not just the \0.

FYI \b is shorthand for backslash as well.

So either of these should work:

XYZNR-1B5.1\b1984-12-10\b0.5\b1\bASDF1
XYZNR-1B5.1\\1984-12-10\\0.5\\1\\ASDF1

like image 71
mujimu Avatar answered Jun 10 '26 19:06

mujimu