Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy/MySQL binary blob is being utf-8 encoded?

I'm using SQLAlchemy and MySQL, with a files table to store files. That table is defined as follows:

mysql> show full columns in files;
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field   | Type         | Collation       | Null | Key | Default | Extra | Privileges                      | Comment |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| id      | varchar(32)  | utf8_general_ci | NO   | PRI | NULL    |       | select,insert,update,references |         |
| created | datetime     | NULL            | YES  |     | NULL    |       | select,insert,update,references |         |
| updated | datetime     | NULL            | YES  |     | NULL    |       | select,insert,update,references |         |
| content | mediumblob   | NULL            | YES  |     | NULL    |       | select,insert,update,references |         |
| name    | varchar(500) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

The content column of type MEDIUMBLOB is where the files are stored. In SQLAlchemy that column is declared as:

__maxsize__ = 12582912 # 12MiB                                                                                                                              
content = Column(LargeBinary(length=__maxsize__))                           

I am not quite sure about the difference between SQLAlchemy's BINARY type and LargeBinary type. Or the difference between MySQL's VARBINARY type and BLOB type. And I am not quite sure if that matters here.

Question: Whenever I store an actual binary file in that table, i.e. a Python bytes or b'' object , then I get the following warning

.../python3.4/site-packages/sqlalchemy/engine/default.py:451: Warning: Invalid utf8 character string: 'BCB121'
  cursor.execute(statement, parameters)

I don't want to just ignore the warning, but it seems that the files are in tact. How do I handle this warning gracefully, how can I fix its cause?

Side note: This question seems to be related, and it seems to be a MySQL bug that it tries to convert all incoming data to UTF-8 (this answer).

like image 346
Jens Avatar asked Jan 01 '16 21:01

Jens


1 Answers

Turns out that this was a driver issue. Apparently the default MySQL driver stumbles with Py3 and utf8 support. Installing cymysql into the virtual Python environment resolved this problem and the warnings disappear.

The fix: Find out if MySQL connects through socket or port (see here), and then modify the connection string accordingly. In my case using a socket connection:

mysql+cymysql://user:pwd@localhost/database?unix_socket=/var/run/mysqld/mysqld.sock

Use the port argument otherwise.

Edit: While the above fixed the encoding issue, it gave rise to another one: blob size. Due to a bug in CyMySQL blobs larger than 8M fail to commit. Switching to PyMySQL fixed that problem, although it seems to have a similar issue with large blobs.

like image 122
Jens Avatar answered Oct 28 '22 01:10

Jens