I wrote a small python batch processor, that loads binary data, performs numpy operations and stores the results. It consumes much more memory, than it should. I looked at similar stack-overflow discussions and would like to ask for further recommendations.
I convert spectral data to rgb. The spectral data is stored in a Band Interleaved by Line (BIL) Image File. That is why I read and process the data line by line. I read the data using the Spectral Python Library, which returns a numpy arrays. hyp is a descriptor of a large spectral file : hyp.ncols=1600, hyp.nrows=3430, hyp.nbands=160
import spectral
import numpy as np
import scipy
class CIE_converter (object):
def __init__(self, cie):
self.cie = cie
def interpolateBand_to_cie_range(self, hyp, hyp_line):
interp = scipy.interpolate.interp1d(hyp.bands.centers,hyp_line, kind='cubic',bounds_error=False, fill_value=0)
return interp(self.cie[:,0])
#@profile
def spectrum2xyz(self, hyp):
out = np.zeros((hyp.ncols,hyp.nrows,3))
spec_line = hyp.read_subregion((0,1), (0,hyp.ncols)).squeeze()
spec_line_int = self.interpolateBand_to_cie_range(hyp, spec_line)
for ii in xrange(hyp.nrows):
spec_line = hyp.read_subregion((ii,ii+1), (0,hyp.ncols)).squeeze()
spec_line_int = self.interpolateBand_to_cie_range(hyp,spec_line)
out[:,ii,:] = np.dot(spec_line_int,self.cie[:,1:4])
return out
All the big data is initialised outside the loop. My naive interpretation was that the memory consumption should not increase (Have I used too much Matlab?) Can someone explain me the increase factor 10? This is not linear,as hyp.nrows = 3430. Are there any recommendations to improve the memory management?
Line # Mem usage Increment Line Contents
================================================
76 @profile
77 60.53 MB 0.00 MB def spectrum2xyz(self, hyp):
78 186.14 MB 125.61 MB out = np.zeros((hyp.ncols,hyp.nrows,3))
79 186.64 MB 0.50 MB spec_line = hyp.read_subregion((0,1), (0,hyp.ncols)).squeeze()
80 199.50 MB 12.86 MB spec_line_int = self.interpolateBand_to_cie_range(hyp, spec_line)
81
82 2253.93 MB 2054.43 MB for ii in xrange(hyp.nrows):
83 2254.41 MB 0.49 MB spec_line = hyp.read_subregion((ii,ii+1), (0,hyp.ncols)).squeeze()
84 2255.64 MB 1.22 MB spec_line_int = self.interpolateBand_to_cie_range(hyp, spec_line)
85 2235.08 MB -20.55 MB out[:,ii,:] = np.dot(spec_line_int,self.cie[:,1:4])
86 2235.08 MB 0.00 MB return out
I replaced range by xrange without drastic improvement. I'm aware that a cubic interpolation is not the fastest, but this is not about CPU consumption.
Thanks for the comments. They all helped me to improve the memory consumption a little. But eventually I figured out what the main reason for the Memory consumption was/is:
SpectralPython Images contain a Numpy Memmap object. This has the same format, as the data structure of the hyperspectral data cube. (in case of a BIL format (nrows, nbands, ncols)) When calling:
spec_line = hyp.read_subregion((ii,ii+1), (0,hyp.ncols)).squeeze()
the image is not only returned as a numpy array return value, but also cached in hyp.memmap. A second call would be faster, but in my case the memory just increases until the OS is complaining. As the memmap is actually a great implementation, I will take direct advantage of it in future work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With