Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak with large dataset when using mysql-python

Tags:

python

mysql

I am experiencing what I believe is a memory leak when using the MySQLdb API

Line #    Mem usage    Increment   Line Contents
================================================
     6                             @profile
     7    10.102 MB     0.000 MB   def main():
     8    10.105 MB     0.004 MB       connection = MySQLdb.connect(host="localhost", db="mydb",
     9    11.285 MB     1.180 MB                                    user="notroot", passwd="Admin123", use_unicode=True)
    10    11.285 MB     0.000 MB       cursor = connection.cursor(cursorclass=MySQLdb.cursors.SSCursor)
    11                                 
    12    11.289 MB     0.004 MB       cursor.execute("select * from a big table;")
    13                                 
    14   254.078 MB   242.789 MB       results = [result for result in cursor]
    15   251.672 MB    -2.406 MB       del results
    16   251.672 MB     0.000 MB       return

Also when exploring the heap with guppy/hpy it shows that most of my memory is occupied by unicode objects, ints and datetime objects (very likely to be to rows return by the MySQLdb API).

I'm using Python 2.7.3, mysql-python==1.2.4 on Ubuntu 12.04 and profiled with memory_profiler.

Could this be interning as described in http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm ?

Am I missing any references dangling around?

EDIT: I also closed the cursor and connection but still got similar results.

SOLVED: Facepalm. I was doing a list comprehension with naturally kept everything in memory. When consuming the iterator properly (streaming to a file or something) it has decent memory usage.

Line #    Mem usage    Increment   Line Contents
================================================
    16                             @profile
    17    10.055 MB     0.000 MB   def main():
    18    10.059 MB     0.004 MB       connection = MySQLdb.connect(host="localhost", db="mydb",
    19    11.242 MB     1.184 MB                                    user="notroot", passwd="Admin123", use_unicode=True)
    20    11.242 MB     0.000 MB       cursor = connection.cursor(cursorclass=MySQLdb.cursors.SSCursor)
    21                                 
    22    11.246 MB     0.004 MB       cursor.execute("select * from big table")
    23    11.246 MB     0.000 MB       count = 0
    24    30.887 MB    19.641 MB       for result in cursor:
    25    30.887 MB     0.000 MB           count = count + 1
    26    30.895 MB     0.008 MB       cursor.close()
    27    30.898 MB     0.004 MB       connection.close()
    28    30.898 MB     0.000 MB       return
like image 814
Andrei Coman Avatar asked Jun 14 '13 14:06

Andrei Coman


1 Answers

Solved by the OP. His original code contained the line

results = [result for result in cursor]

This list comprehension stored the entire result in memory, rather than streaming it from the server as needed. The OP replaced it with a simple

for result in cursor:
    ...

and saw his memory usage go back to normal.

like image 195
Quuxplusone Avatar answered Nov 15 '22 19:11

Quuxplusone