The error message I got was
Traceback (most recent call last):
File "./test.py", line 416, in <module>
startup()
File "./test.py", line 275, in startup
writer.save(r,data)
File "/home/user/project/test/output.py", line 91, in save
self.save_doc(r, data, pid)
File "/home/user/project/test/output.py", line 130, in save_doc
cursor.execute(dbquery)
File "/usr/local/lib/python2.6/site-packages/django/db/backends/util.py", line 34, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.6/site-packages/django/db/backends/mysql/base.py", line 86, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-x86_64.egg/MySQLdb/cursors.py", line 175, in execute
File "/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-x86_64.egg/MySQLdb/cursors.py", line 89, in _warning_check
_mysql_exceptions.Warning: Data truncated for column 'url' at row 1
The first thing I was checking is if the URL string is longer than the field length, but it is actually much shorter. Database Design
+----------+-------------------------------------------------------------------+
| Database | Create Database |
+----------+-------------------------------------------------------------------+
| myurlcol | CREATE DATABASE `myurlcol` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-------------------------------------------------------------------+
Table Design
CREATE TABLE `document` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(255) collate utf8_bin NOT NULL,
`md5` varchar(32) collate utf8_bin NOT NULL,
`host` varchar(255) collate utf8_bin default NULL,
`content_sha1` varchar(40) collate utf8_bin NOT NULL,
`add_date` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
UNIQUE KEY `md5` (`md5`),
KEY `main_crawl_document_content_sha1` (`content_sha1`),
KEY `main_crawl_document_discover_date` (`add_date`),
KEY `main_crawl_document_host` (`host`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
I print out the the lengths of the data I am trying to insert into the table ( I am testing using only one URL ):
len(url) = 89
len(md5) = 32
len(host) = 20
len(content_sha1) = 40
len(add_date) = 19
I was using a cursor created by Django.db.connection. To be more informative, I paste the complete database query command I passed to cursor.execute()
INSERT INTO main_document SET url='ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/0a/39/Arthritis_Res_2000_Jun_5_2%284%29_315-326.tar.gz',md5='b6ba3adde8de87e4dc255092b04d07ea',host='ftp.ncbi.nlm.nih.gov',content_sha1='9aeab4412cc9b1add84a6d2bca574664e193b56e',add_date='2012-05-15 00:00:00';
The funny thing is that, the command above works when I copy and pasted to the MySQL command line. No error messages and the data are just inserted correctly.
What is going wrong?
How do I fix truncated data? Fix the data that we are trying to insert or update. Data length should not exceed the maximum allowed limit for the particular column. Use 'SET ANSI_WARNINGS OFF' to truncate the data and insert it as per column maximum string length.
Truncated means “cut short”, and “data truncated” warnings or errors refer to a value's data being cut off at the end during the importing process (e.g. a value of “2.9823” being imported as “2.98”). These errors are important warnings, because it notifies us that our data has not been imported accurately.
First, check if the data type of the column is right for the input data. Maybe its defined length is smaller than it should be, or maybe there's a misalignment that resulted in a value trying to be stored in a field with different datatype.
cursor.execute
handles MySQL escaping if you use it correctly. Some Examples Here
The basic idea is to use %s
in the raw SQL string where you currently are including the raw values, and then pass a second argument to cursor.execute()
which is a tuple (or array) of the values in order. In your case, this would look like:
url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/0a/39/Arthritis_Res_2000_Jun_5_2%284%29_315-326.tar.gz'
md5 = 'b6ba3adde8de87e4dc255092b04d07ea'
host = 'ftp.ncbi.nlm.nih.gov'
content_sha1 = '9aeab4412cc9b1add84a6d2bca574664e193b56e'
add_date = '2012-05-15 00:00:00'
sql = "INSERT INTO main_document SET url = %s, md5 = %s, host = %s, content_sha1 = %s, add_date = %s"
cursor.execute(sql, (url, md5, host, content_sha1, add_date))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With