Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python MySQLdb slow in updating values

Tags:

python

mysql

I am trying to update a database with values from a csv file, the following is my code:

import MySQLdb as mdb
import sys
import csv

con = None
command = ''
new_name_list = []
old_name_list = []
duplicates = []
update_list = []
file = 'csv_file.csv'
listReader = csv.reader(open(file, 'r'))
for row in listReader:
    new_name_list.append(row)

try:

    con = mdb.connect('localhost', 'root', 'mypassword', 'mydb')
    con.autocommit(True)

    cur = con.cursor()
    cur.execute("SELECT fil_name FROM file WHERE fil_name like 'boy%' and fil_job_id=1")    

    numrows = int(cur.rowcount)

    for i in range(numrows):
        file_name = cur.fetchone()
    old_name_list.append(file_name[0])

    d = dict(new_name_list)

    for n in old_name_list:
        try:
            print n + " has been updated to " +  d[n]
            command = "UPDATE file SET fil_name='" + d[n] + "' WHERE fil_name='" + n + "'"
            cur.execute(command)
        except KeyError:
            duplicates.append(n)

except mdb.Error, e:

    print "Error %d: %s" % (e.args[0],e.args[1])
    sys.exit(1)

finally:    

    if con:    
        con.close()

It takes about 2-3 seconds for each print to appear, which leads me to think that the update execution is being made slowly. I have a lot of values to update and this should not be the speed that it should be executing (given that I was able to get a quick printout of all the values of d[n] )

Is there anyway to speed up the updating?

EDIT: The database is using InnoDB engine

like image 916
Zerhinne Avatar asked Jun 14 '12 01:06

Zerhinne


2 Answers

You could try using executemany:

data = [(n, d[n]) for n in old_name_list]
cur.executemany("UPDATE file SET fil_name='%s'  WHERE fil_name='%s'", data)

Additionally, you may want to consider indexing fil_name (assuming fil_name is read-mostly)

like image 189
dave Avatar answered Nov 07 '22 23:11

dave


According to your description,each print cost 2~3 seconds,so i think the problem maybe of this:

  1. do you have your fil_name column of table file indexed?
  2. you make auto_commit be true,each update is a transaction commited.

If the case is 1,just create index on that column,do not make table scan on updating.

If the case is 2,@dave gives a good answer.

like image 39
yancl Avatar answered Nov 07 '22 23:11

yancl