Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python have slow db-query, but Perl not

I use python (Django) for my web-shop.

When I tested high loading (db access) got interesting results:

python 10 process = 200sec / 100% CPU utilisation
perl 10 process  = 65sec / 35% CPU utilisation

Centos 6, python 2.6, mysql 5.5, standard libraries, mysql-server on other server. Table product_cars have 70 000 000 records.

Why python-program so slow?

Python program:

#!/usr/bin/python
import MySQLdb
import re
from MySQLdb import cursors
import shutil
import datetime
import random

db0 = MySQLdb.connect(user="X", passwd="X", db="parts")
cursor0 = db0.cursor()
cursor0.execute('SET NAMES utf8')

now = datetime.datetime.now()
for x in xrange(1, 100000):
    id = random.randint(10, 50000)
    cursor0.execute("SELECT * FROM product_cars WHERE car_id=%s LIMIT 500", [id])
    cursor0.fetchone()

Perl program:

#!/usr/bin/perl
use DBI;
my $INSTANCE=$ARGV[0];
my $user = "x";
my $pw = "x";
my $db = DBI->connect( "dbi:mysql:parts", "x", "x");
my $sql= "SELECT * FROM product_cars WHERE car_id=? LIMIT 500";
foreach $_ ( 1 .. 100000 )
{
 $random = int(rand(50000));
 $cursor = $db->prepare($sql);
 $cursor->execute($random) || die $cursor->errstr;
 @Data= $cursor->fetchrow_array();
}

$cursor->finish;
$db->disconnect;

update1

Interesting thing:

select always row with id=1:

Сlear that MYSQL use cache and query will be very fast, but again slow and 100% CPU utilisation. But same perl or ruby code work quick.

if replace string in python code:

# remove "SET NAMES utf8" string - this has no impact
# python-mysql use "%s", but not "?" as parameter marker
id = 1
for x in xrange(1, 100000):
    id = 1
    cursor0.execute("SELECT * FROM product_cars WHERE car_id=%s LIMIT 500", [id])
    cursor0.fetchone()

Same code in perl:

foreach $_ ( 1 .. 20000 )
{
 $cursor = $db->prepare( "SELECT * FROM product_cars WHERE car_id=? LIMIT 500";);
 $cursor->execute(1);
#    while (my @Data= $cursor->fetchrow_array())
 if ($_ % 1000 == 0) { print "$_\n" };.
 @Data= $cursor->fetchrow_array();
# print "$_\n";
}

Code in ruby:

pk=2
20000.times do |i|
    if i % 1000 == 0
        print i, "\n"
    end
    res = my.query("SELECT * FROM product_cars WHERE car_id='#{pk}' LIMIT 500")
    res.fetch_row
end

update 2

Exec SQL "SELECT * FROM product WHERE id=1" (string without params) 100000 times
Python: ~15 sec 100% CPU 100%
Perl:   ~9 sec CPU 70-90%
Ruby:   ~6 sec CPU 60-80%

MySQL-server on other machine.


update 3

Tried use oursql and pymysql - worse results.

like image 866
Andrew G Avatar asked Dec 04 '11 18:12

Andrew G


2 Answers

As people have pointed out, the way you're preparing and executing statements between the two is not the same and is not the recommended practice. Both should be taking advantage of prepared statements, and both should be preparing outside the loop.

However, it looks like that Python MySQL driver does not take advantage of server side prepared statements at all. This probably accounts for the poor performance.

Server side prepared statements were added in MySQL 4.1, but some drivers have been very slow to adapt. The MySQLdb users guide makes no mention of prepared statements and thinks "there are no cursors in MySQL, and no parameter substitution" which hasn't been true since MySQL 4.1. It also says "MySQLdb's Connection and Cursor objects are written in Python" rather than taking advantage of the MySQL API.

You may want to look at the oursql driver. It looks like it was written to take advantage of the "new" MySQL API and let the database optimize itself.

DBD::mysql (the Perl MySQL driver) can take advantage of prepared statements, but it does not by default according to the documentation. You have to turn it on by adding mysql_server_prepare=1 to your dsn. That should make the Perl example run even faster. Or the documentation is lying and they're on by default.

As an aside, one thing that will throw off benchmarks, though not account for anything like 2 minutes difference, is generating random numbers. They have significant cost.

Python code

#!/usr/bin/python
import random

for x in xrange(1, 100000):
    id = random.randint(0, 50000)

Perl code

#!/usr/bin/perl
foreach $_ ( 1 .. 100000 )
{
 $random = int(rand(50000));
}

Python time

real    0m0.194s
user    0m0.184s
sys     0m0.008s

Perl time

real    0m0.019s
user    0m0.015s
sys     0m0.003s

To keep this from becoming an issue in more sensitive benchmarks, increment a counter instead.

like image 127
Schwern Avatar answered Oct 29 '22 08:10

Schwern


In theory, your Perl code should speed up significantly if you execute $cursor = $db->prepare($sql); before the loop and simply reexecute the same prepared query repeatedly. I suspect either DBI or MySQL has simply cached and ignored your repeated identical query preparations.

Your Python code, on the other hand, demands that different queries be recompiled each time because you aren't using a prepared query. I'd expect the speed difference to evaporate if you prepare both queries properly before their loop. There are security benefits for to using prepared queries as well, by the way.

like image 35
Jeff Burdges Avatar answered Oct 29 '22 07:10

Jeff Burdges