I am translating a code from perl to python. Even if it works exactly the same, there is a part of the code that is 5x slower in python than in perl and I cannot figure out why.
Both perl and python are in the same machine, as well as the mysql database.
The code queries a db to download all columns of a table and then process each row. There are more than 5 million rows to process and the big issue is in retrieving the data from the database to the python processing.
Here I attach the two code samples: Python:
import os
import mysql.connector **<--- import mySqlDb**
import time
outDict = dict()
## DB parameters
db = mysql.connector.connect **<----- mySqlDb.connect( ...**
(host=dbhost,
user=username, # your username
passwd=passw, # your password
db=database) # name of the data base
cur = db.cursor(prepared=True)
sql = "select chr,pos,lengthofrepeat,copyNum,region from db.Table_simpleRepeat;"
cur.execute(sql)
print('\t eDiVa public omics start')
s = time.time()
sz = 1000
rows = cur.fetchall()
for row in rows:
## process out dict
print time.time() - s
cur.close()
db.close()
While here comes the Perl equivalent script:
use strict;
use Digest::MD5 qw(md5);
use DBI;
use threads;
use threads::shared;
my $dbh = DBI->connect('dbi:mysql:'.$database.';host='.$dbhost.'',$username,$pass)
or die "Connection Error!!\n";
my $sql = "select chr,pos,lengthofrepeat,copyNum,region from db.Table_simpleRepeat\;";
## prepare statement and query
my $stmt = $dbh->prepare($sql);
$stmt->execute or die "SQL Error!!\n";
my $c = 0;
#process query result
while (my @res = $stmt->fetchrow_array)
{
$edivaStr{ $res[0].";".$res[1] } = $res[4].",".$res[2];
$c +=1;
}
print($c."\n");
## close DB connection
$dbh->disconnect();
The runtime for these two scripts is:
I cannot figure out why this happens [I tried using fetchone() or fetchmany() to see if there are memory issues but the runtime at most reduces 10% from the 200s].
My main problem is understanding why there is such a relevant performance difference between the two functionally equivalent code blocks.
Any idea about how can I verify what is happening would be greatly appreciated.
Thanks!
Peeyush'comment could be an answer and I'd like him to post it because it allowed me to find a solution.
The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.
I added the changes in the python code with a <---- "" to show how easy it has been to gain performance.
the cursor.fetchall
means you load all your data in memory at once, instead of doing it slowly when needed.
Replace
row = cur.fetchall()
for row in rows:
by
for row in cur:
The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.
I added the changes in the python code with a <---- "" to show how easy it has been to gain performance
I encounter the same problem. With Python cx_Oracle, here's my environment performance stats -- Python takes very long to connect to Oracle DB.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With