Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

22021: invalid byte sequence for encoding "UTF8": 0x00

Tags:

c#

postgresql

I am doing a bulk import to PostgreSQL from C# and one of the records gives me this error:

22021: invalid byte sequence for encoding "UTF8": 0x00

I googled it and the general advice is that this refers to a null field but in my instance this is not the case. I tracked down the string that causes the error and it is this:

Addresses the following: Let $A$ be a Banach algebra, and let $\sum:\0\rightarrow I\rightarrow\mathfrak A\overset\pi\to\longrightarrow A\rightarrow 0$ be an extension of $A$, where $\mathfrak A$ is a Banach algebra and $I$ is a closed ideal in $\mathfrak A$.

I am reading this from an XML file and have UTF-8 defined on the file stream.

The escaped string on my deserialized C# class is:

"Addresses the following: Let $A$ be a Banach algebra, and let $\\sum\\:\\0\\rightarrow I\\rightarrow\\mathfrak A\\overset\\pi\\to\\longrightarrow A\\rightarrow 0$ be an extension of $A$, where $\\mathfrak A$ is a Banach algebra and $I$ is a closed ideal in $\\mathfrak A$."

Obviously something is not right with the string. I am guessing some sort of mathmatical symbols should be there but what exactly about this is breaking the import and making PostgreSQL report that it is a null field? What format should that be read in?

If I manually overwite this field the import works so it is 100% an issue with this string.

like image 683
Guerrilla Avatar asked Nov 07 '22 16:11

Guerrilla


1 Answers

Since it's a bulk import, I'm assuming you're creating a file or some kind of big string to send to Postgres? In that case the strings probably have escape characters enabled, as opposed to executing this via, say, a prepared statement. So it's probably that \0 in your string that Postgres is escaping and interpreting as a 0x00.

from the docs: https://www.postgresql.org/docs/8.3/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS

PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g. E'foo'. (When continuing an escape string constant across lines, write E only before the first opening quote.) Within an escape string, a backslash character () begins a C-like backslash escape sequence, in which the combination of backslash and following character(s) represents a special byte value. \b is a backspace, \f is a form feed, \n is a newline, \r is a carriage return, \t is a tab. Also supported are \digits, where digits represents an octal byte value, and \xhexdigits, where hexdigits represents a hexadecimal byte value. (It is your responsibility that the byte sequences you create are valid characters in the server character set encoding.) Any other character following a backslash is taken literally. Thus, to include a backslash character, write two backslashes (\). Also, a single quote can be included in an escape string by writing \', in addition to the normal way of ''.

So if your bulk statement is prepending strings with E, like E'hello', don't do that.

like image 133
Nick Garyu Avatar answered Nov 14 '22 22:11

Nick Garyu