Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are null bytes allowed in unicode strings in PostgreSQL via Python?

Are null bytes allowed in unicode strings?

I don't ask about utf8, I mean the high level object representation of a unicode string.

Background

We store unicode strings containing null bytes via Python in PostgreSQL.

The strings cut at the null byte if we read it again.

like image 302
guettli Avatar asked Mar 02 '15 15:03

guettli


People also ask

Are Unicode strings NULL terminated?

The code 0x0000 is the Unicode string terminator for a null-terminated string. A single null byte is not sufficient for this code, because many Unicode characters contain null bytes as either the high or the low byte. An example is the letter A, for which the character code is 0x0041.

Can UTF-8 string contain NULL?

No, NUL cannot be in any arbitrary place in a UTF-8 string, the extension bytes may not be NUL.

What is NULL Unicode?

� - Null: U+0000 - Unicode Character Table.

How do I remove a null character from a string in Python?

The str. replace() method will remove occurrences of the \x00 character by replacing them with an empty string. Copied! The \x00 character is a Null-character that represents a HEX byte with all bits at 0.


2 Answers

About the database side, PostgreSQL itself does not allow null byte ('\0') in a string on char/text/varchar fields, so if you try to store a string containing it you receive an error. Example:

postgres=# SELECT convert_from('foo\000bar'::bytea, 'unicode');
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0x00

If you really need to store such information, then you can use bytea data type on PostgreSQL side. Make to sure to encode it correctly.

like image 82
MatheusOl Avatar answered Oct 29 '22 22:10

MatheusOl


Python itself is perfectly capable of having both byte strings and Unicode strings with null characters having a value of zero. However if you call out to a library implemented in C, that library may use the C convention of stopping at the first null character.

like image 25
Mark Ransom Avatar answered Oct 29 '22 22:10

Mark Ransom