I am experiencing what I believe is a memory leak when using the MySQLdb API
Line # Mem usage Increment Line Contents
================================================
6 @profile
7 10.102 MB 0.000 MB def main():
8 10.105 MB 0.004 MB connection = MySQLdb.connect(host="localhost", db="mydb",
9 11.285 MB 1.180 MB user="notroot", passwd="Admin123", use_unicode=True)
10 11.285 MB 0.000 MB cursor = connection.cursor(cursorclass=MySQLdb.cursors.SSCursor)
11
12 11.289 MB 0.004 MB cursor.execute("select * from a big table;")
13
14 254.078 MB 242.789 MB results = [result for result in cursor]
15 251.672 MB -2.406 MB del results
16 251.672 MB 0.000 MB return
Also when exploring the heap with guppy
/hpy
it shows that most of my memory is occupied by unicode objects, ints and datetime objects (very likely to be to rows return by the MySQLdb API).
I'm using Python 2.7.3, mysql-python==1.2.4
on Ubuntu 12.04 and profiled with memory_profiler
.
Could this be interning as described in http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm ?
Am I missing any references dangling around?
EDIT: I also closed the cursor and connection but still got similar results.
SOLVED: Facepalm. I was doing a list comprehension with naturally kept everything in memory. When consuming the iterator properly (streaming to a file or something) it has decent memory usage.
Line # Mem usage Increment Line Contents
================================================
16 @profile
17 10.055 MB 0.000 MB def main():
18 10.059 MB 0.004 MB connection = MySQLdb.connect(host="localhost", db="mydb",
19 11.242 MB 1.184 MB user="notroot", passwd="Admin123", use_unicode=True)
20 11.242 MB 0.000 MB cursor = connection.cursor(cursorclass=MySQLdb.cursors.SSCursor)
21
22 11.246 MB 0.004 MB cursor.execute("select * from big table")
23 11.246 MB 0.000 MB count = 0
24 30.887 MB 19.641 MB for result in cursor:
25 30.887 MB 0.000 MB count = count + 1
26 30.895 MB 0.008 MB cursor.close()
27 30.898 MB 0.004 MB connection.close()
28 30.898 MB 0.000 MB return
Solved by the OP. His original code contained the line
results = [result for result in cursor]
This list comprehension stored the entire result in memory, rather than streaming it from the server as needed. The OP replaced it with a simple
for result in cursor:
...
and saw his memory usage go back to normal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With