Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python hangs during a cursor.execute() using Psycopg2

I currently have a problem with my Python script that when I execute a SQL function from cursor.execute() for a specific function, It runs the function, function completes, and then the script hangs for somewhere close to five hours before resuming the rest of the program.

I watch the server status of my function in pgAdminIII as it is called from Python and when the function completes the server just shows an open connection to my python-user. Before it would idle in transaction, but after adding conn.set_isolation_level(0), it only shows the open connection.

I have also put prints surrounding the cursor.execute() but it takes five hours to proceed to the following print statement. I have tried killing the connection and restarting it to no avail. I stripped the function call completely out of my script so it was the only process my script handled. Nothing has worked.

The difficulty is that both sides keep pointing to the other as the source of the problem. I am assuming it is some sort of interaction effect from this specific function. Perhaps a signal is not being sent back to python from postgres after running the specific function? None of the other functions have an issue with the same method of execution.

Ideas? Solutions? -I was considering a timeout, but the runtime can vary anywhere from 20 minutes to 2 hours. -The function it runs makes a table and returns only a string stating 'Complete', all operations are accounted for in the database after it disappears as an active function in server status.

Here is the code I use for replicating the problem, isolated from my main script:

#!/usr/bin/env python

import sys
import os
import smtplib
import ftplib
import gzip
import logging
import shutil
import argparse
import traceback
import subprocess
from email.mime.text import MIMEText

import psycopg2
import psycopg2.extras
import psycopg2.extensions
from psycopg2 import OperationalError

pg_db    = "ppl"
pg_port  = "5432"
pg_user  = "<user>"
pg_pass  = "<pass>"
pg_host  = "<host>"

# connect to database
try:
    conn = psycopg2.connect(database=pg_db,
                            port=pg_port, 
                            user=pg_user, 
                            password=pg_pass, 
                            host=pg_host)
    conn.set_isolation_level(0)
except:
    print "Unable to connect to the database."

cur = conn.cursor()

print 'Starting select <function1>'
cur.execute("select <function1>(999999, null );")
print 'Assign variable...'
a = cur.fetchall()
print 'Done'

print 'Starting select <function2>'
cur.execute("select <function2>(999999, null );")
print 'Assign variable...'
b = cur.fetchall()
print 'Done'

print a
print b

conn.close()
sys.exit()

Then I ran a strace as suggested with these results:

<previous function statements for about 4k lines>

write(1, "Starting select <function>"..., 45) = 45
sendto(3, "Q\0\0\0009select <function>"..., 58, MSG_NOSIGNAL, NULL, 0) = 58
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "T\0\0\0001\0\1<function_name>\0"..., 16384, 0, NULL, NULL) = 500
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0mSINFO\0C00000\0Mv_eval_table "..., 16384, 0, NULL, NULL) = 110
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\4\247SINFO\0C00000\0Mcmd := CREATE"..., 16384, 0, NULL, NULL) = 1192
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\23hSINFO\0C00000\0Mcmd := \t--For"..., 16384, 0, NULL, NULL) = 4969
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 84
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 672
...
brk(0xbe9000)                           = 0xbe9000
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 84
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 168
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 1008
...

<does this for about 600k more lines before executing the rest of the script, I am assuming it is indefinitely looping until a timeout>

Next I ran a cProfile for a function that wrapped the second block of code that called the second function:

Tue Jan 15 16:04:50 2013    function2_prof

         7 function calls in 15755.988 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'cursor' of 'psycopg2._psycopg.connection' objects}
        1    0.000    0.000 15755.988 15755.988 test_sql_sub2.py:48(LR2)
        1 15755.988 15755.988 15755.988 15755.988 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
        1    0.000    0.000 15755.988 15755.988 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'close' of 'psycopg2._psycopg.cursor' objects}
        1    0.000    0.000    0.000    0.000 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects}
like image 470
BagoDev Avatar asked Nov 04 '22 06:11

BagoDev


2 Answers

The issue was due to a single line in multiple for-loops in the SQL function:

raise info 'license_row_id := %', rec.license_row_id;

These raise info messages were somehow not handled appropriately by python as they were by bash and pgAdmin. My guess is that because they were unhandled, they kept looping and processing and asking for more memory until it was unable to process, finally dropping the raise infos without warning or error and resuming the program.

I commented out this needless raise info line and the whole script ran in 20 minutes, a normal run time for the program. If anyone has an idea how to make python handle raise info messages from postgresql that would probably even further answer this problem.

like image 107
BagoDev Avatar answered Nov 15 '22 06:11

BagoDev


I think your function does more than you realize. Look at your strace:

brk(0xbe9000)                           = 0xbe9000
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 84
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 168
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 1008

brk means "give me more memory", so clearly Python is accumulating data. poll checks whether there's any data waiting; a positive return value (here, 1) means there is. And recvfrom reads that data and returns the number of bytes read.

So it looks like Postgres is sending you a lot of data, so much that it takes a significant amount of time to receive. I don't know much about Postgres's wire format, but the license_row_i... in every poll sure sounds like you're getting back tons and tons of rows. The function no longer shows as executing because it isn't; the work is done, and Postgres is now just sending the results back.

It's no wonder both sides are blaming the other; the problem is in the middle. :)


edit: Aha, more information implicates RAISE INFO. Psycopg2 does indeed store all of those notices, on the connection object.

It only keeps the last fifty, but if you're sending over half a million notices to the client, it'll take a while to keep turning them into Python strings, throwing away the oldest, appending the newest, etc. If the command-line client is ignoring them entirely (or even just doing this all in C), it'll naturally be much faster.

So the easy solution is "don't do that". :) You can also configure Postgres not to send these to the client, but of course they'd end up useless for debugging. There's no way, that I can see, to ask psycopg2 not to collect notices sent to the client.

like image 28
Eevee Avatar answered Nov 15 '22 05:11

Eevee