python 5x slower than perl mySql query

Question

I am translating a code from perl to python. Even if it works exactly the same, there is a part of the code that is 5x slower in python than in perl and I cannot figure out why.

Both perl and python are in the same machine, as well as the mysql database.

The code queries a db to download all columns of a table and then process each row. There are more than 5 million rows to process and the big issue is in retrieving the data from the database to the python processing.

Here I attach the two code samples: Python:

import os
import mysql.connector **<--- import mySqlDb**
import time

outDict = dict()
## DB parameters
db = mysql.connector.connect **<----- mySqlDb.connect( ...** 
     (host=dbhost, 
user=username, # your username
passwd=passw, # your password
db=database) # name of the data base
cur = db.cursor(prepared=True)
sql = "select chr,pos,lengthofrepeat,copyNum,region from db.Table_simpleRepeat;"
cur.execute(sql)
print('	 eDiVa public omics start')
s  = time.time()
sz = 1000
rows = cur.fetchall()
for row in rows:
    ## process out dict    
print time.time() - s 
cur.close()
db.close()

While here comes the Perl equivalent script:

use strict;
use Digest::MD5 qw(md5);
use DBI;
use threads;
use threads::shared;

my $dbh = DBI->connect('dbi:mysql:'.$database.';host='.$dbhost.'',$username,$pass) 
    or die "Connection Error!!
";    
    my $sql = "select chr,pos,lengthofrepeat,copyNum,region from   db.Table_simpleRepeat\;";
    ## prepare statement and query
    my $stmt = $dbh->prepare($sql);
    $stmt->execute or die "SQL Error!!
";
    my $c = 0;
    #process query result
    while (my @res = $stmt->fetchrow_array) 
    {
        $edivaStr{ $res[0].";".$res[1] } = $res[4].",".$res[2]; 
        $c +=1;
    }
    print($c."
");
    ## close DB connection
    $dbh->disconnect();

The runtime for these two scripts is:

~40s for the Perl script
~200s for the Python script

I cannot figure out why this happens [I tried using fetchone() or fetchmany() to see if there are memory issues but the runtime at most reduces 10% from the 200s].

My main problem is understanding why there is such a relevant performance difference between the two functionally equivalent code blocks.

Any idea about how can I verify what is happening would be greatly appreciated.

Thanks!

UPDATE ABOUT SOLUTION

Peeyush'comment could be an answer and I'd like him to post it because it allowed me to find a solution.

The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.

I added the changes in the python code with a <---- "" to show how easy it has been to gain performance.

Arthur · Accepted Answer

the cursor.fetchall means you load all your data in memory at once, instead of doing it slowly when needed.

Replace

row = cur.fetchall()
for row in rows:

by

for row in cur:

mattia b · Answer

The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.

I added the changes in the python code with a <---- "" to show how easy it has been to gain performance

stw2019 · Answer

I encounter the same problem. With Python cx_Oracle, here's my environment performance stats -- Python takes very long to connect to Oracle DB.

connect to DB, elaps:0.38108
run query, elaps:0.00092
get filename from table, elaps:8e-05
run query to read BLOB, elaps:0.00058
decompress data and write to file, elaps:0.00187
close DB connection, elaps:0.00009
Over all, elaps:0.38476
same function in Perl, elaps:0.00213

python 5x slower than perl mySql query

Tags:

python

mysql

perl

UPDATE ABOUT SOLUTION

mattia b

3 Answers

Arthur

mattia b

stw2019

Recent Activity

Donate For Us

python 5x slower than perl mySql query

Tags:

python

mysql

perl

UPDATE ABOUT SOLUTION

mattia b

3 Answers

Arthur

mattia b

stw2019

Related questions

Recent Activity

Donate For Us