Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why am I leaking memory with this python loop?

I am writing a custom file system crawler, which gets passed millions of globs to process through sys.stdin. I'm finding that when running the script, its memory usage increases massively over time and the whole thing crawls practically to a halt. I've written a minimal case below which shows the problem. Am I doing something wrong, or have I found a bug in Python / the glob module? (I am using python 2.5.2).


#!/usr/bin/env python
import glob
import sys
import gc

previous_num_objects = 0

for count, line in enumerate(sys.stdin):
   glob_result = glob.glob(line.rstrip('\n'))
   current_num_objects = len(gc.get_objects())
   new_objects = current_num_objects - previous_num_objects

   print "(%d) This: %d, New: %d, Garbage: %d, Collection Counts: %s"\
 % (count, current_num_objects, new_objects, len(gc.garbage), gc.get_count())
   previous_num_objects = current_num_objects

The output looks like:

(0) This: 4042, New: 4042, Python Garbage: 0, Python Collection Counts: (660, 5, 0)
(1) This: 4061, New: 19, Python Garbage: 0, Python Collection Counts: (90, 6, 0)
(2) This: 4064, New: 3, Python Garbage: 0, Python Collection Counts: (127, 6, 0)
(3) This: 4067, New: 3, Python Garbage: 0, Python Collection Counts: (130, 6, 0)
(4) This: 4070, New: 3, Python Garbage: 0, Python Collection Counts: (133, 6, 0)
(5) This: 4073, New: 3, Python Garbage: 0, Python Collection Counts: (136, 6, 0)
(6) This: 4076, New: 3, Python Garbage: 0, Python Collection Counts: (139, 6, 0)
(7) This: 4079, New: 3, Python Garbage: 0, Python Collection Counts: (142, 6, 0)
(8) This: 4082, New: 3, Python Garbage: 0, Python Collection Counts: (145, 6, 0)
(9) This: 4085, New: 3, Python Garbage: 0, Python Collection Counts: (148, 6, 0)

Every 100th iteration, 100 objects are freed, so len(gc.get_objects() increases by 200 every 100 iterations. len(gc.garbage) never changes from 0. The 2nd generation collection count increases slowly, while the 0th and 1st counts go up and down.

like image 875
Andy Avatar asked Feb 02 '10 12:02

Andy


People also ask

What can cause memory leak Python?

One of the leading causes of memory leaks across programming languages is the presence of unreferenced objects in the memory. Unreferenced objects are allocated in the memory but the reference to these objects has been deleted in the course of the control flow.

Does Infinite Loop cause memory leak?

Stack memory leaks occur when a method keeps getting called but never exits. This can happen if there is an infinite loop or if the method is being called with different data each time but the data is never used. Eventually, the stack will fill up and the program will run out of memory.


1 Answers

I tracked this down to the fnmatch module. glob.glob calls fnmatch to actually perform the globbing, and fnmatch has a cache of regular expressions which is never cleared. So in this usage, the cache was growing continuously and unchecked. I've filed a bug against the fnmatch library [1].

[1]: http://bugs.python.org/issue7846 Python Bug

like image 78
Andy Avatar answered Sep 29 '22 11:09

Andy