Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

enforce column encoding with sqlalchemy

I am using sqlalchemy to create the schema of my database. I have no success in enforcing the use of utf-8, no matter what I tried.

Here is a minimal python script that recreates my problem:

from sqlalchemy import create_engine, Column, Unicode
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('mysql+mysqldb://user:password@localhost/multidic?charset=utf8', echo=True)
Base = declarative_base()
class MyTableName(Base):
    __tablename__ = "mytablename"
    test_column = Column(Unicode(2),primary_key=True)
Base.metadata.create_all(engine)

After running this script, when I look into the database, I see that the encoding is latin1 instead of utf-8:

mysql> SHOW FULL COLUMNS FROM mytablename;
+-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field       | Type       | Collation         | Null | Key | Default | Extra | Privileges                      | Comment |
+-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| test_column | varchar(2) | latin1_swedish_ci | NO   | PRI | NULL    |       | select,insert,update,references |         |
+-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)

I have tried changing the type of the column created (String instead of Unicode), and tried also to add the argument encoding = "utf8" in the call to create_engine, but none of it worked.

So, my question is:

How to enforce the use of a given character encoding (utf-8 in my case) in MySQL, with sqlalchemy ?

Thank you :)

Notes:

I am using sqlalchemy 0.7 and python 2.7; I can possibly upgrade one or both, but only if it is the only solution!

I have mysql 5, and it supports utf-8:

mysql> show character set where charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci   |      3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
like image 202
Jealie Avatar asked Sep 01 '13 17:09

Jealie


People also ask

How do I make a column unique in SQLAlchemy?

Unique constraints can be created anonymously on a single column using the unique keyword on Column . Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint table-level construct.

What does Create_all do in SQLAlchemy?

Sqlalchemy create_all method is used to create a new table into the database. This method will first check whether the table exists in the database or not if suppose it has found an existing table it will not create any table.

What does DB Drop_all () do?

You delete everything in the database using the db. drop_all() function to add the tags and post_tag tables safely and to avoid any of the common issues related to adding new tables to a database. Then you create all the tables anew using the db. create_all() function.

What is Create_engine in SQLAlchemy?

The create_engine() method of sqlalchemy library takes in the connection URL and returns a sqlalchemy engine that references both a Dialect and a Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.


2 Answers

To specify a specific collation per column, use the collation parameter on the data type:

class MyTableName(Base):
    __tablename__ = "mytablename2"
    test_column = Column(Unicode(2),
                         primary_key=True)
    test_column2 = Column(Unicode(2, collation='utf8_bin'))
#                                    ^^^^^^^^^^^^^^^^^^^^

Mind that MySQL understands this as both the set of codepoints to describe the text as well as the sort order the text will be indexed with; the usual suspects like 'utf8' or 'utf-8' won't be familiar to MySQL (use SHOW COLLATION to see the full list)

mysql> show full columns from mytablename2;
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field        | Type       | Collation         | Null | Key | Default | Extra | Privileges                      | Comment |
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| test_column  | varchar(2) | latin1_swedish_ci | NO   | PRI | NULL    |       | select,insert,update,references |         |
| test_column2 | varchar(2) | utf8_bin          | YES  |     | NULL    |       | select,insert,update,references |         |
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)

mysql> 
like image 106
SingleNegationElimination Avatar answered Sep 18 '22 13:09

SingleNegationElimination


For me the collation parameter did not work.

My connection string was:

db = create_engine('mysql+pymysql://user:pass@dbhost/schema?charset=utf8')

Pymysql was executing set names utf8 because of the charset, and the database was converting the utf8 to the encoding of the table, resulting in data loss.

If i left the charset out, the charset was defaulting to latin1, and pymysql was trying to encode my utf8 strings to latin1 before sending them to the database, thus throwing UnicodeEncode errors.

This worked for me :session.execute(text("SET NAMES latin1")) to make the database assume the utf8 strings i was sending did not need to be converted.

like image 21
Valnick Avatar answered Sep 19 '22 13:09

Valnick