[Situation-Aim]
We have a data-set in CSV of 1 million rows. These need to be imported into a mysql 5.6 database, table with innodb engine, with following columns (matching csv file):
Column1: bigint(15)
Column2: datetime
To import this data, we could use the following python script:
#python version: 3.4.3
#mysql connector version: 2.0.4
#Note: this is just a simple example script!
import mysql.connector
import csv
#Set-up db connection
cnx = mysql.connector.connect(user='user', password='secret', database='mydatabase')
cursor = cnx.cursor()
#read/ open csv
reader = csv.reader(open("C:/test.csv", "rt"))
#ignore header
next(reader)
#read CSV and send to mysql
for row in reader:
id=row[0]
adate = row[1]
cursor.execute("""INSERT INTO mytable VALUES ('%s','%s')""" % (id,adate))
#commit the query
cnx.commit()
[Questions]
What is conceptually happening when I put the cnx.commit() inside the for loop as follows:
for row in reader:
id=row[0]
adate = row[1]
cursor.execute("""INSERT INTO mytable VALUES ('%s','%s')""" % (id,adate))
cnx.commit()
Thanks!
You've shown two extreme positions.
One is to commit only after your megarow of insertions. The other is to commit after every row.
Both of these have poor performance. The first is bad because MySQL creates a big transaction, and then at the end has to commit the whole thing. This takes a lot of RAM or temp space. The second is bad because it commits many little tiny transactions.
The best performing way to do this kind of bulk insert is to commit every thousand rows or so. But that's just a little harder to program than the alternatives. In my experience, committing after each thousand rows or so is several times faster than the alternatives, so it's worth the small amount of extra complexity.
Another fast (but brittle) way to do this kind of bulk load is with LOAD DATA INFILE
, about which you can read here: https://dev.mysql.com/doc/refman/5.6/en/load-data.html
There's one little twist on your megarow commit: MySQL connections can have an autocommit mode set. In that case your first example is equivalent to your second. By default that mode is disabled in the python connector.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With