I am using sqlalchemy to create the schema of my database. I have no success in enforcing the use of utf-8, no matter what I tried.
Here is a minimal python script that recreates my problem:
from sqlalchemy import create_engine, Column, Unicode
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('mysql+mysqldb://user:password@localhost/multidic?charset=utf8', echo=True)
Base = declarative_base()
class MyTableName(Base):
__tablename__ = "mytablename"
test_column = Column(Unicode(2),primary_key=True)
Base.metadata.create_all(engine)
After running this script, when I look into the database, I see that the encoding is latin1 instead of utf-8:
mysql> SHOW FULL COLUMNS FROM mytablename;
+-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| test_column | varchar(2) | latin1_swedish_ci | NO | PRI | NULL | | select,insert,update,references | |
+-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
I have tried changing the type of the column created (String instead of Unicode), and tried also to add the argument encoding = "utf8" in the call to create_engine, but none of it worked.
So, my question is:
How to enforce the use of a given character encoding (utf-8 in my case) in MySQL, with sqlalchemy ?
Thank you :)
I am using sqlalchemy 0.7 and python 2.7; I can possibly upgrade one or both, but only if it is the only solution!
I have mysql 5, and it supports utf-8:
mysql> show character set where charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
Unique constraints can be created anonymously on a single column using the unique keyword on Column . Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint table-level construct.
Sqlalchemy create_all method is used to create a new table into the database. This method will first check whether the table exists in the database or not if suppose it has found an existing table it will not create any table.
You delete everything in the database using the db. drop_all() function to add the tags and post_tag tables safely and to avoid any of the common issues related to adding new tables to a database. Then you create all the tables anew using the db. create_all() function.
The create_engine() method of sqlalchemy library takes in the connection URL and returns a sqlalchemy engine that references both a Dialect and a Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.
To specify a specific collation per column, use the collation
parameter on the data type:
class MyTableName(Base):
__tablename__ = "mytablename2"
test_column = Column(Unicode(2),
primary_key=True)
test_column2 = Column(Unicode(2, collation='utf8_bin'))
# ^^^^^^^^^^^^^^^^^^^^
Mind that MySQL understands this as both the set of codepoints to describe the text as well as the sort order the text will be indexed with; the usual suspects like 'utf8' or 'utf-8' won't be familiar to MySQL (use SHOW COLLATION
to see the full list)
mysql> show full columns from mytablename2;
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| test_column | varchar(2) | latin1_swedish_ci | NO | PRI | NULL | | select,insert,update,references | |
| test_column2 | varchar(2) | utf8_bin | YES | | NULL | | select,insert,update,references | |
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)
mysql>
For me the collation parameter did not work.
My connection string was:
db = create_engine('mysql+pymysql://user:pass@dbhost/schema?charset=utf8')
Pymysql was executing set names utf8 because of the charset, and the database was converting the utf8 to the encoding of the table, resulting in data loss.
If i left the charset out, the charset was defaulting to latin1, and pymysql was trying to encode my utf8 strings to latin1 before sending them to the database, thus throwing UnicodeEncode errors.
This worked for me :session.execute(text("SET NAMES latin1"))
to make the database assume the utf8 strings i was sending did not need to be converted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With