I am trying to update a database with values from a csv file, the following is my code:
import MySQLdb as mdb
import sys
import csv
con = None
command = ''
new_name_list = []
old_name_list = []
duplicates = []
update_list = []
file = 'csv_file.csv'
listReader = csv.reader(open(file, 'r'))
for row in listReader:
new_name_list.append(row)
try:
con = mdb.connect('localhost', 'root', 'mypassword', 'mydb')
con.autocommit(True)
cur = con.cursor()
cur.execute("SELECT fil_name FROM file WHERE fil_name like 'boy%' and fil_job_id=1")
numrows = int(cur.rowcount)
for i in range(numrows):
file_name = cur.fetchone()
old_name_list.append(file_name[0])
d = dict(new_name_list)
for n in old_name_list:
try:
print n + " has been updated to " + d[n]
command = "UPDATE file SET fil_name='" + d[n] + "' WHERE fil_name='" + n + "'"
cur.execute(command)
except KeyError:
duplicates.append(n)
except mdb.Error, e:
print "Error %d: %s" % (e.args[0],e.args[1])
sys.exit(1)
finally:
if con:
con.close()
It takes about 2-3 seconds for each print to appear, which leads me to think that the update execution is being made slowly. I have a lot of values to update and this should not be the speed that it should be executing (given that I was able to get a quick printout of all the values of d[n]
)
Is there anyway to speed up the updating?
EDIT: The database is using InnoDB engine
You could try using executemany
:
data = [(n, d[n]) for n in old_name_list]
cur.executemany("UPDATE file SET fil_name='%s' WHERE fil_name='%s'", data)
Additionally, you may want to consider indexing fil_name (assuming fil_name is read-mostly)
According to your description,each print cost 2~3 seconds,so i think the problem maybe of this:
If the case is 1,just create index on that column,do not make table scan on updating.
If the case is 2,@dave gives a good answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With