I have need to parse a huge gz file (about ~10GB compressed, ~100GB uncompressed). The code creates data structure ('data_struct') in memory. I am running on a machine with Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
with 16 CPUs and plenty RAM (ie 200+ GB), running CentOS-6.9. I have implemented these things using a Class in Python3.6.3 (CPython) as shown below :
class my_class():
def __init__(self):
cmd = f'gunzip huge-file.gz'
self.process = subprocess(cmd, stdout=subprocess.PIPE, shell=True)
self.data_struct = dict()
def populate_struct(self):
for line in process.stdout:
<populate the self.data_struct dictionary>
def __del__():
self.process.wait()
#del self.data_struct # presence/absence of this statement decreases/increases runtime respectively
#================End of my_class===================
def main():
my_object = my_class()
my_object.populate_struct()
print(f'~~~~ Finished populate_struct() ~~~~') # last statement in my program.
## Python keeps running at 100% past the previous statement for 10+mins
if __name__ == '__main__':
main()
#================End of Main=======================
The resident memory consumption of my data_struct
in memory (RAM only, no swap) is about ~33GB. I did $ top
to find the PID of Python process and traced the Python process using $ strace -p <PID> -o <out_file>
(to see what Python is doing). While it is executing populate_struct()
, I can see in the out_file of strace that Python is using calls like mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0684160000
to create data_struct
. While Python was running past the last print()
statement, I found that Python was issuing only munmap()
operations as shown below :
munmap(0x2b3c75375000, 41947136) = 0
munmap(0x2b3c73374000, 33558528) = 0
munmap(0x2b4015d2a000, 262144) = 0
munmap(0x2b4015cea000, 262144) = 0
munmap(0x2b4015caa000, 262144) = 0
munmap(0x2b4015c6a000, 262144) = 0
munmap(0x2b4015c2a000, 262144) = 0
munmap(0x2b4015bea000, 262144) = 0
munmap(0x2b4015baa000, 262144) = 0
...
...
The Python keeps running anywhere between 10+ mins to 12mins after the last print()
statement. An observation is that if I have del self.data_struct
statement in __del__()
method, then it takes 2mins only. I have done these experiments multiple times and the runtime decrease/increase by the presence/absence of del self.data_struct
in __del__()
.
My questions :
munmap()
, but unlike Python, other languages like Perl immediately release memory and exit the program. Am I doing it right by implementing as shown above ? Is there a way to tell Python to avoid this munmap()
?del self.data_struct
statement in __del__()
, and takes only 2mins to cleanup if there is del self.data_struct
statement in __del__()
?munmap()
?Other thoughts/suggestions about tackling this problem are appreciated.
Please try a more recent version of Python (at least 3.8)? This shows several signs of being a mild(!) form of a worst-case quadratic-time algorithm in CPython's object deallocator, which was rewritten here (and note that the issue linked to here in turn contains a link to an older StackOverflow post with more details):
https://bugs.python.org/issue37029
If my guess is right, the amount of memory isn't particularly important - it's instead the sheer number of distinct Python objects being managed by CPython's "small object allocator" (obmalloc.c
), combined with "bad luck" in the order in which their memory is released.
When that code was first written, RAM wasn't big enough to hold millions of Python objects, so nobody noticed that one particular part of deallocation logic could take time quadratic in the number of allocated "arenas" (details aren't really helpful, but "arenas" are the granularity at which system mmap()
and munmap()
calls are made - 256 KiB chunks).
It's not those mapping calls that are consuming mounds of time, and any decent implementation of any language using OS memory mapping facilities will eventually call munmap()
enough times to release the OS resources consumed by its mmap()
calls.
So that's a red herring. munmap()
is being called many times simply because you allocated many objects, which required many mmap()
calls.
There isn't any crisp or easy way to explain exactly when the problem shows up. See "bad luck" above ;-) The relevant code was rewritten for CPython 3.8 to be worst-case linear time instead, which gave a factor of ~250 speedup for the specific program that triggered the issue report (see the link already given).
As a comment noted, you can exit your program immediately at any time by invoking os._exit()
, but the leading underscore is meant to scare you off: "immediately" means "immediately". No cleanups of any kind are performed. For example, the __del__
method in your class? Skipped. __del__
is run as a side effect of deallocation, but if you actually "immediately release memory and exit the program" then no destructors of any kind are run, nor any handlers registered with the atexit
module, etc etc. It's as drastic as a program dying, e.g., with a segfault.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With