Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python MySQLdb module memory leak

the popular python mysql module "MySQLdb" seems to have a memory leak issue. here's the code:

conn = MySQLdb.connect(...)
cursor = conn.cursor(cursorclass = MySQLdb.cursors.DictCursor)
sql = "select * from `test`"
cursor.execute(sql)  #leak start
cursor.close()
conn.close()
time.sleep(20)

Assume the test is a table with billion records. I ran the python code and execute

ps aux | awk '{print $2, $4, $11}' | grep python

at the same time, the results is that the memory use increased to 47.0% and never go back, even I close the cursor and the conn. Any ideas?

like image 451
Chenglu Avatar asked Mar 26 '14 10:03

Chenglu


1 Answers

In this post, Fredrik Lundh explains why memory may not get returned to the system, even though it is not a memory leak. Near the bottom he explains why (in Python2) range(50*1024*100) may consume a lot of memory that will not get freed, even after deleting the list. And he mentions that using xrange is a way to avoid the memory problem in the first place.

Similarly, using an SSDictCursor instead of a DictCursor may be a way to avoid the memory problem in your case. An SSDictCursor causes the MySQL server to retain the result set on the server side, and the cursor will fetch rows from the result set one-at-a-time only as needed:

import MySQLdb
import MySQLdb.cursors as cursors
conn = MySQLdb.connect(..., cursorclass=cursors.SSDictCursor) #1
cursor = conn.cursor()
cursor.execute('select * from test')  #2
for row in cursor:                    #3
    print(row)
conn.close()
  1. Note the cursorclass=cursors.SSDictCursor in the call to connect.
  2. With a DictCursor (or any non-SS cursor), this call to execute would cause MySQLdb to load the entire result set into a Python object (such as a list of dicts).
  3. With an SSDictCursor, MySQLdb retrieves the rows one at a time.

So this will avoid the memory build-up problem, provided you don't need to hold the entire result set in Python all at once.

Note also that when using an SSCursor or SSDictCursor, "No new queries can be issued on the connection until the entire result set has been fetched." Cursors from two different connections can be used concurrently. That may not be an issue for you, but it's something to be aware of.

You might also want to check out oursql, an alternative DB adapter for MySQL. oursql cursors are server-side cursors which fetch lazily by default.

like image 188
unutbu Avatar answered Sep 18 '22 12:09

unutbu