Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 5x slower than perl mySql query

Tags:

python

mysql

perl

I am translating a code from perl to python. Even if it works exactly the same, there is a part of the code that is 5x slower in python than in perl and I cannot figure out why.

Both perl and python are in the same machine, as well as the mysql database.

The code queries a db to download all columns of a table and then process each row. There are more than 5 million rows to process and the big issue is in retrieving the data from the database to the python processing.

Here I attach the two code samples: Python:

import os
import mysql.connector **<--- import mySqlDb**
import time

outDict = dict()
## DB parameters
db = mysql.connector.connect **<----- mySqlDb.connect( ...** 
     (host=dbhost, 
user=username, # your username
passwd=passw, # your password
db=database) # name of the data base
cur = db.cursor(prepared=True)
sql = "select chr,pos,lengthofrepeat,copyNum,region from db.Table_simpleRepeat;"
cur.execute(sql)
print('\t eDiVa public omics start')
s  = time.time()
sz = 1000
rows = cur.fetchall()
for row in rows:
    ## process out dict    
print time.time() - s 
cur.close()
db.close()        

While here comes the Perl equivalent script:

use strict;
use Digest::MD5 qw(md5);
use DBI;
use threads;
use threads::shared;

my $dbh = DBI->connect('dbi:mysql:'.$database.';host='.$dbhost.'',$username,$pass) 
    or die "Connection Error!!\n";    
    my $sql = "select chr,pos,lengthofrepeat,copyNum,region from   db.Table_simpleRepeat\;";
    ## prepare statement and query
    my $stmt = $dbh->prepare($sql);
    $stmt->execute or die "SQL Error!!\n";
    my $c = 0;
    #process query result
    while (my @res = $stmt->fetchrow_array) 
    {
        $edivaStr{ $res[0].";".$res[1] } = $res[4].",".$res[2]; 
        $c +=1;
    }
    print($c."\n");
    ## close DB connection
    $dbh->disconnect();    

The runtime for these two scripts is:

  • ~40s for the Perl script
  • ~200s for the Python script

I cannot figure out why this happens [I tried using fetchone() or fetchmany() to see if there are memory issues but the runtime at most reduces 10% from the 200s].

My main problem is understanding why there is such a relevant performance difference between the two functionally equivalent code blocks.

Any idea about how can I verify what is happening would be greatly appreciated.

Thanks!

UPDATE ABOUT SOLUTION

Peeyush'comment could be an answer and I'd like him to post it because it allowed me to find a solution.

The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.

I added the changes in the python code with a <---- "" to show how easy it has been to gain performance.

like image 794
mattia b Avatar asked Nov 06 '14 15:11

mattia b


3 Answers

the cursor.fetchall means you load all your data in memory at once, instead of doing it slowly when needed.

Replace

row = cur.fetchall()
for row in rows:

by

for row in cur:
like image 192
Arthur Avatar answered Sep 21 '22 08:09

Arthur


The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.

I added the changes in the python code with a <---- "" to show how easy it has been to gain performance

like image 22
mattia b Avatar answered Sep 22 '22 08:09

mattia b


I encounter the same problem. With Python cx_Oracle, here's my environment performance stats -- Python takes very long to connect to Oracle DB.

  • connect to DB, elaps:0.38108
  • run query, elaps:0.00092
  • get filename from table, elaps:8e-05
  • run query to read BLOB, elaps:0.00058
  • decompress data and write to file, elaps:0.00187
  • close DB connection, elaps:0.00009
  • Over all, elaps:0.38476
  • same function in Perl, elaps:0.00213
like image 40
stw2019 Avatar answered Sep 22 '22 08:09

stw2019