I'm developing a data analysis worker in python using numpy and pandas. I will deploy lots of these workers so I want to keep it lightweight.
I tried checking with this code:
import logging
import resource
logging.basicConfig(level=logging.DEBUG)
def printmemory(msg):
currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
logging.debug(msg+': total memory:%r Mb' % (int(currentmemory)/1000000.))
printmemory('begin')
#from numpy import array, nan, mean, std, sqrt, square
import numpy as np
printmemory('numpy')
import pandas as pd
printmemory('numpy')
and I found out that simply loading them to memory will make my worker pretty heavy. Is there a way to reduce the memory footprint of numpy and pandas?
Otherwise, any suggestion on a better solution?
Python does not free memory back to the system immediately after it destroys some object instance. It has some object pools, called arenas, and it takes a while until those are released. In some cases, you may be suffering from memory fragmentation which also causes process' memory usage to grow.
Use #pragma pack(1) to byte align your structures. Use unions where a structure can contain different types of data. Use bit fields rather than ints to store flags and small integers. Avoid using fixed length character arrays to store strings, implement a string pool and use pointers.
The Python memory manager is involved only in the allocation of the bytes object returned as a result. In most situations, however, it is recommended to allocate memory from the Python heap specifically because the latter is under control of the Python memory manager.
Sorry to tell you, but there is no way to load into memory only a part of a python module. You could use multi-threading if that applies to your case - threads can share the same module memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With