The code is extremely simple. It shouldn't have any leaks since all is done inside the function. And nothing is returned.
I have a function which goes over all lines in a file (~20 MiB) and puts them all into a list.
Mentioned function:
def read_art_file(filename, path_to_dir):
import codecs
corpus = []
corpus_file = codecs.open(path_to_dir + filename, 'r', 'iso-8859-15')
newline = corpus_file.readline().strip()
while newline != '':
# we put into @article a @newline of file and some other info
# (i left those lists blank for readability)
article = [newline, [], [], [], [], [], [], [], [], [], [], [], []]
corpus.append(article)
del newline
del article
newline = corpus_file.readline().strip()
memory_usage('inside function')
for article in corpus:
for word in article:
del word
del article
del corpus
corpus_file.close()
memory_usage('inside: after corp deleted')
return
Here is the main code:
memory_usage('START')
path_to_dir = '/home/soshial/internship/training_data/parser_output/'
read_art_file('accounting.n.txt.wpr.art', path_to_dir)
memory_usage('outside func')
time.sleep(5)
memory_usage('END')
All memory_usage
just prints amount of KiB allocated by the script.
If I run the script, it gives me:
START memory: 6088 KiB
inside memory: 393752 KiB (20 MiB file + lists occupy 400 MiB)
inside: after corp deleted memory: 43360 KiB
outside func memory: 34300 KiB (34300-6088= 28 MiB leaked)
FINISH memory: 34300 KiB
And if I do absolutely the same thing, but with appending article
to the corpus
commented out:
article = [newline, [], [], [], [], [], ...] # we still assign data to `article`
# corpus.append(article) # we don't have this string during second execution
This way output gives me:
START memory: 6076 KiB
inside memory: 6076 KiB
inside: after corp deleted memory: 6076 KiB
outside func memory: 6076 KiB
FINISH memory: 6076 KiB
Hence, this way all memory is being freed. I need to have all memory freed since I'm going to process hundreds of such files.
Is it that I do something wrong or it is the CPython interpreter bug?
UPD. This is how I check memory consumption (taken from some other stackoverflow question):
def memory_usage(text = ''):
"""Memory usage of the current process in kilobytes."""
status = None
result = {'peak': 0, 'rss': 0}
try:
# This will only work on systems with a /proc file system
# (like Linux).
status = open('/proc/self/status')
for line in status:
parts = line.split()
key = parts[0][2:-1].lower()
if key in result:
result[key] = int(parts[1])
finally:
if status is not None:
status.close()
print('>', text, 'memory:', result['rss'], 'KiB ')
return
What causes memory leaks in Python? The Python program, just like other programming languages, experiences memory leaks. Memory leaks in Python happen if the garbage collector doesn't clean and eliminate the unreferenced or unused data from Python.
A memory leak may also happen when an object is stored in memory but cannot be accessed by the running code (i.e. unreachable memory). A memory leak has symptoms similar to a number of other problems and generally can only be diagnosed by a programmer with access to the program's source code.
You can detect memory leaks in Python by monitoring your Python app's performance via an Application Performance Monitoring tool such as Scout APM. Once you detect a memory leak, there are multiple ways to solve it.
Please note that python never guarantees that any memory that your code uses will actually get returned to the OS. All that garbage collection guarantees is that the memory used by an object which has been collected is free to be used by another object at some future time.
From what I've read1 about the Cpython implementation of the memory allocator, memory gets allocated in "pools" for efficiency. When a pool is full, python will allocate a new pool. If a pool contains only dead objects, Cpython actually free the memory associated with that pool, but otherwise it doesn't. This can lead to multiple partially full pools hanging around after a function or whatever. However, this doesn't really mean it is a "memory leak". (Cpython still knows about the memory and could potentially free it at some later time).
1I'm not a python dev, so these details are likely to be incorrect or at least incomplete
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With