Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does PostgreSQL PL/Python compare with Python outside it in terms of performances?

I run the exact same Python function, one as a PostgreSQL PL/Python, and the other one outside PostgreSQL as a usual Python script.

Surprisingly, when I call the PostgreSQL PL/Python using select * from pymax7(20000);, it takes on average 65 seconds, while when I call the usual Python script python myscript.py 20000 it takes an average 48 seconds. The averages were computed running the queries and scripts 10 times.

Should such a difference be expected? How does Python inside the PostgreSQL RDBMS (PL/Python) compares with Python outside it in terms of performances?

I'm running PostgreSQL 9.1 and Python 2.7 on Ubuntu 12.04 64bits.

PostgreSQL PL/Python:

CREATE FUNCTION pymax7 (b integer)
  RETURNS float
AS $$    
  a = 0
  for i in range(b):
    for ii in range(b):
      a = (((i+ii)%100)*149819874987) 
  return a
$$ LANGUAGE plpythonu;

Python:

import time
import sys

def pymax7 (b):     
    a = 0
    for i in range(b):
        for ii in range(b):
            a = (((i+ii)%100)*149819874987) # keeping Python busy
    return a

def main():    
    numIterations = int(sys.argv[1])        
    start = time.time()
    print pymax7(numIterations)
    end = time.time()
    print "Time elapsed in Python:"
    print str((end - start)*1000) + ' ms'        

if __name__ == "__main__":
    main()
like image 327
Franck Dernoncourt Avatar asked May 15 '13 23:05

Franck Dernoncourt


1 Answers

There shouldn't be any difference. Both of your test cases have about the same run time for me, 53 seconds plus or minus 1.

I did adjust the PL/Python test case to use the same measuring technique as the plain Python test case:

CREATE FUNCTION pymax7a (b integer)
  RETURNS float
AS $$
  import time
  start = time.time()
  a = 0
  for i in range(b):
    for ii in range(b):
      a = (((i+ii)%100)*149819874987)
  end = time.time()
  plpy.info("Time elapsed in Python: " + str((end - start)*1000) + ' ms')
  return a
$$ LANGUAGE plpythonu;

This would tell you if there is any non-Python overhead involved. FWIW, for me, the difference between what this printed and what psql on the client printed as the total time was consistently less than 1 millisecond.

like image 141
Peter Eisentraut Avatar answered Oct 22 '22 16:10

Peter Eisentraut