I am extremely confused and puzzled by how I store strings with unusual characters (to someone who is used to dealing with a UK English character set) in them.
Here is my example.
I have this name: Bientôt l'été
This is how I created my table:
CREATE TABLE MyTable(
    'my_id' INT(10) unsigned NOT NULL,
    'my_name' TEXT CHARACTER SET utf8 NOT NULL,
    PRIMARY KEY(`my_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Using this simplified python script I am trying to insert the string into a MySQL database and table:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import MySQLdb
mystring = "Bientôt l'été"
myinsert = [ { "name" : mystring.encode("utf-8").strip()[:65535], "id" : 1 } ]
con = None
con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
cur = con.cursor()
sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
cur.executemany( sql, myinsert )
con.commit()
if con: con.close()
If I then try to read the name in the database it is stored as: Bientôt l'été
I want it to read: Bientôt l'été
How do I get the python script/MySQL database to do this? I think this is something to do with the character set and how it is set but I can't find a simple web page that explains this without any technical jargon. I've been struggling with this for hours!
I have looked at this and I see character_set_server is set as latin1 but I don't know if this is the problem or how to change it:
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
                MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character.
To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.
Did you try, this query set names utf8;
#!/usr/bin/python
# -*- coding: utf-8 -*-
import MySQLdb
mystring = "Bientôt l'été"
myinsert = [{ "name": mystring.encode("utf-8").strip()[:65535], "id": 1 }]
con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
cur = con.cursor()
cur.execute("set names utf8;")     # <--- add this line,
sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
cur.executemany( sql, myinsert )
con.commit()
if con: con.close()
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With