With sqlalchemy 0.9.7, I am trying to store emoji into MySQL 5.5 with utf8mb4 enabled. However, for some reason sqlalchemy is killing my emoji chars and I cannot figure out why. I can see the emoji char in the object before attempting to save the data via sqlalchemy. After the save, an error is thrown and emoji is rendered as ????.
The error is the following. Note it is surrounded by debug messages that output type(post.message)
and post.message
.
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:274]:
<type 'unicode'>
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:275]:
š
--------------------------------------------------------------------------------
/mnt/hgfs/crw/dev/hyper/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py:436: Warning: Incorrect string value: '\xF0\x9F\x98\x83' for column 'message' at row 1
cursor.execute(statement, parameters)
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:277]:
<type 'unicode'>
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:278]:
????
--------------------------------------------------------------------------------
My code looks like:
Base = declarative_base()
post = Table('post', Base.metadata,
Column("id", Integer, primary_key=True),
Column("message", UnicodeText),
Column("created_at", DateTime),
Column("updated_at", DateTime),
)
mapper(Post, post)
Is there any other setup I need to do to make this work?
Update: If I do the following:
Base = declarative_base()
post = Table('post', Base.metadata,
Column("id", Integer, primary_key=True),
Column("message", UnicodeText),
Column("created_at", DateTime),
Column("updated_at", DateTime),
mysql_engine='InnoDB',
mysql_charset='utf8mb4'
)
mapper(Post, post)
I get the following:
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:274]:
<type 'unicode'>
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:275]:
š
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:277]:
<type 'unicode'>
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in __init__ [/mnt/hgfs/crw/dev/hyper/hyper/blueprint/chat/__init__.py:278]:
????
--------------------------------------------------------------------------------
No error, but it is still obliterating the emoji. What is going on?!
Update 2:
Changing the database URI being called from:
mysql+mysqldb://user:pass@localhost/datab?charset=utf8
To
mysql+mysqldb://user:pass@localhost/datab
Fixed the problem. sqlalchemy threw an error when I tried to use charset=utf8mb4&use_unicode=0
. But is that wise? According to http://docs.sqlalchemy.org/en/rel_0_9/dialects/mysql.html#module-sqlalchemy.dialects.mysql.mysqldb, it is not! Would love any feedback on this solution.
Below's how I got emojis etc. working. I'm using Python 3.5 / Flask / Flask-SQLAlchemy.
Note: This fix assumes you're early enough in development that you're OK with recreating your database by running db.drop_all()
and db.create_all()
.
As recommended in this guide, open up a database console and run ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
As recommended in this answer, add ?charset=utf8mb4
to the end of your SQLALCHEMY_DATABASE_URI
string.
mysql+mysqlconnector://{username}:{password}@{hostname}/{databasename}
mysql+mysqlconnector://{username}:{password}@{hostname}/{databasename}?charset=utf8mb4
Now just rerun db.drop_all()
and db.create_all()
.
Here is the solution that worked for me. set_unicode
gets called the first time a connection enters the connection pool and only gets called once. SET NAMES
basically fixes the entire pipeline from your code to the database storage to ensure the correct UTF-8 charset (utf8mb4
) is used. As long as utf8mb4
is ensured end to end, emojis should be stored / rendered fine as they are just regular unicode characters outside of BMP (Basic Multilingual Plane), which unfortunately MySQL decided that it was a good idea to only implement 3-byte Unicode implementation for BMP as the unicode
charset.
See SQLAlchemy events reference for other relevant event hooks that you can tap into to do connection by connection configuration tweaks.
import logging
from sqlalchemy import event
logger = logging.getLogger(__name__)
@event.listens_for(Pool, "connect")
def set_unicode(dbapi_conn, conn_record):
cursor = dbapi_conn.cursor()
try:
cursor.execute("SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci'")
except Exception as e:
logger.debug(e)
Update: regarding extra options in connection string, I prefer to be explicit and specify charset=utf8mb4
because my environment's default charset is utf8
(which is the culprit of why emoji cannot be encoded correctly), but never specify use_unicode=0
unless you are running on python 2.x and performance is a bottleneck as you pointed out in the link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With