Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to convert PostgreSQL text column to bytea

In my application I am using a postgresql database table with a "text" column to store pickled python objects. As database driver I'm using psycopg2 and until now I only passed python-strings (not unicode-objects) to the DB and retrieved strings from the DB. This basically worked fine until I recently decided to make String-handling the better/correct way and added the following construct to my DB-layer:

psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

This basically works fine everywhere in my application and I'm using unicode-objects where possible now.

But for this special case with the text-column containing the pickled objects it makes troubles. I got it working in my test-system this way:

  • retrieving the data: SELECT data::bytea, params FROM mytable
  • writing the data: execute("UPDATE mytable SET data=%s", (psycopg2.Binary(cPickle.dumps(x)),) )

... but unfortunately I'm getting errors with the SELECT for some columns in the production-system:

psycopg2.DataError: invalid input syntax for type bytea

This error also happens when I try to run the query in the psql shell.

Basically I'm planning to convert the column from "text" to "bytea", but the error above also prevents me from doing this conversion.

As far as I can see, (when retrieving the column as pure python string) there are only characters with ord(c)<=127 in the string.

like image 326
powo Avatar asked Oct 10 '13 15:10

powo


1 Answers

The problem is that casting text to bytea doesn't mean, take the bytes in the string and assemble them as a bytea value, but instead take the string and interpret it as an escaped input value to the bytea type. So that won't work, mainly because pickle data contains lots of backslashes, which bytea interprets specially.

Try this instead:

SELECT convert_to(data, 'LATIN1') ...

This converts the string into a byte sequence (bytea value) in the LATIN1 encoding. For you, the exact encoding doesn't matter, because it's all ASCII (but there is no ASCII encoding).

like image 107
Peter Eisentraut Avatar answered Oct 12 '22 09:10

Peter Eisentraut