I am parsing json data and trying to store some of the json data into Mysql database. I am currently getting following unicode error. My question is how should I handle this.
Here is my table structure
CREATE TABLE yahoo_questions ( question_id varchar(40) NOT NULL, question_subj varbinary(255), question_content varbinary(255), question_userId varchar(40) NOT NULL, question_timestamp varchar(40), category_id varbinary(20) NOT NULL, category_name varchar(40) NOT NULL, choosen_answer varbinary(255), choosen_userId varchar(40), choosen_usernick varchar(40), choosen_ans_timestamp varchar(40), UNIQUE (question_id) );
Error While inserting via python code:
Traceback (most recent call last): File "YahooQueryData.py", line 78, in <module> +"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", (row[2], row[5], row[6], quserId, questionTime, categoryId, categoryName, qChosenAnswer, choosenUserId, choosenNickName, choosenTimeStamp)) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/cursors.py", line 159, in execute query = query % db.literal(args) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/connections.py", line 264, in literal return self.escape(o, self.encoders) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/connections.py", line 202, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 204-230: ordinal not in range(256)
Python Code segment:
#pushing user id to the url to get full json stack urlobject = urllib.urlopen(base_url.format(row[2])) qnadatajson = urlobject.read() data = json.loads(qnadatajson) cur.execute("INSERT INTO yahoo_questions (question_id, question_subj, question_content, question_userId, question_timestamp," +"category_id, category_name, choosen_answer, choosen_userId, choosen_usernick, choosen_ans_timestamp)" +"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", (row[2], row[5], row[6], quserId, questionTime, categoryId, categoryName, qChosenAnswer, choosenUserId, choosenNickName, choosenTimeStamp))
json Structure
questions: [ { Id: "20111201185322AA5HTDc", Subject: "what are the new pokemon call?", Content: "I used to know them I stop at dialga and palkia version and I heard there's new ones what's it call ", Date: "2011-12-01 18:53:22", Timestamp: "1322794402",
What I also did prior to running the query I execute the following on mysql SET character_set_client = utf8
And this how the mysql variables looks like:
mysql> SHOW variables LIKE '%character_set%'; +--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql-5.5.10-osx10.6-x86_64/share/charsets/ | +--------------------------+--------------------------------------------------------+ 8 rows in set (0.00 sec)
Python is a computer programming language often used to build websites and software, automate tasks, and conduct data analysis. Python is a general-purpose language, meaning it can be used to create a variety of different programs and isn't specialized for any specific problems.
Python is widely considered among the easiest programming languages for beginners to learn. If you're interested in learning a programming language, Python is a good place to start.
Answer: (b) C Explanation: Python is written in C programming language, and it is also called CPython.
Python is undoubtedly considered a top programming language at the same level as JavaScript or C++, and it's one of the most used languages by businesses and enterprises. Even though it's almost 30 years old, Python is still relevant, given its ease of use, its vibrant community, and many applications.
I think that your MYSQLdb python library doesn't know it's supposed to encode to utf8, and is encoding to the default python system-defined charset latin1
.
When you connect()
to your database, pass the charset='utf8'
parameter. This should also make a manual SET NAMES
or SET character_set_client
unnecessary.
First, make sure you are assigning the charset
and use_unicode
parameters when making your MySQL connection:
conn = mysql.connect(host='127.0.0.1', user='user', passwd='passwd', db='db', charset='utf8', use_unicode=True)
Secondly, use prepared statements when actually querying the database. Below is an example INSERT query of a string containing a unicode character.
cursor.execute('INSERT INTO mytable VALUES (null, %s)', ('Some string that contains unicode: ' + unichr(300),))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With