I'm trying to read data slices from a netcdf4 file using netcdf4-python. This is the first time using python and I am running into memory issues. Below is a simplified version of the code. On each iteration of the loop memory jumps by the equivalent of the data slice I read. How can I clean up the memory as I iterate over each variable?
#!/usr/bin/env python
from netCDF4 import Dataset
import os
import sys
import psutil
process = psutil.Process(os.getpid())
def print_memory_usage():
nr_mbytes = process.get_memory_info()[0] / 1048576.0
sys.stdout.write("{}\n".format(nr_mbytes))
sys.stdout.flush()
# open input file and gather variable info
rootgrp_i = Dataset('data.nc','r')
vargrp_i = rootgrp_i.variables
# lets create a dictionary to store the metadata in
subdomain = {}
for suff in range(1000):
for var in vargrp_i:
v_i = vargrp_i[var]
if v_i.ndim == 1:
a=v_i[:]
elif v_i.ndim == 2:
a=v_i[0:20, 0:20]
elif v_i.ndim == 3:
a=v_i[0, 0:20, 0:20]
elif v_i.ndim == 4:
a=v_i[0, 0:75, 0:20, 0:20]
else:
a=v_i[0]
del a
print_memory_usage()
rootgrp_i.close()
I think the problem is a misinterpretation of del a
meaning.
According to Python Language Reference:
Deletion of a name removes the binding of that name from the local or global namespace, depending on whether the name occurs in a global statement in the same code block.
This means that del a
dereference the a variable, but this doesn't imply the memory will be immediately released, this depends on how the garbage collector works. You can ask the garbage collector to collects new garbage using the collect() method:
import gc
gc.collect()
This related post can be useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With