I've encountered such error:
File "/vagrant/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute
cursor.execute(statement, parameters)
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 8410: ordinal not in range(128)
It happens when Im trying to save ORM object with assigned Python's unicode
string. And as a result dict
parameters
has a unicode string as one of its values and it produces the error while coercing it to str
type.
I've tried to set convert_unicode=True
setting on engine and column, but without success.
So what is a good way to handle unicode in SQLAlchemy?
This is some details about my setup:
Table:
Table "public.documents"
Column | Type | Modifiers
------------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('documents_id_seq'::regclass)
sha256 | text | not null
url | text |
source | text | not null
downloaded | timestamp with time zone | not null
tags | json | not null
Indexes:
"documents_pkey" PRIMARY KEY, btree (id)
"documents_sha256_key" UNIQUE CONSTRAINT, btree (sha256)
ORM model:
class Document(Base):
__tablename__ = 'documents'
id = Column(INTEGER, primary_key=True)
sha256 = Column(TEXT(convert_unicode=True), nullable=False, unique=True)
url = Column(TEXT(convert_unicode=True))
source = Column(TEXT(convert_unicode=True), nullable=False)
downloaded = Column(DateTime(timezone=True), nullable=False)
tags = Column(JSON, nullable=False)
SQLAlchemy settngs:
ENGINE = create_engine('postgresql://me:secret@localhost/my_db',
encoding='utf8', convert_unicode=True)
Session = sessionmaker(bind=ENGINE)
And the code that produces the error is just creaes a session, instantiates a Document
object and saves it with the source
fieldwith
unicode` strign assigned to it.
Check this repo - it has automated Vagrant/Ansible setup, and it reproduces this bug.
Your problem is here:
$ sudo grep client_encoding /etc/postgresql/9.3/main/postgresql.conf
client_encoding = sql_ascii
That causes psycopg2 to default to ASCII:
>>> import psycopg2
>>> psycopg2.connect('dbname=dev_db user=dev').encoding
'SQLASCII'
... which effectively shuts off psycopg2's ability to handle Unicode.
You can either fix this in postgresql.conf:
client_encoding = utf8
(and then sudo invoke-rc.d postgresql reload
), or you can specify the encoding explicitly when you create the engine:
self._conn = create_engine(src, client_encoding='utf8')
I recommend the former, because the early nineties are long gone. : )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With