So I have created a list for multiprocessing stuff(in particular, it is multiprocessing.Pool().starmap()
) and want to reduce its memory size. The list is the following:
import sys
import numpy as np
from itertools import product
lst1 = np.arange(1000)
lst3 = np.arange(0.05, 4, 0.05)
lst1_1 = list(product(enumerate(lst3),
(item for item in product(lst1, lst1) if item[0] < item[1])
))
Its memory size calculated from sys.getsizeof(lst1_1)
is 317840928
Seeing that the type of lst1
is int32
, I thought changing the dtype of the lst to int16
can reduce the memorysize of lst1
and, consequently, ls1_1
by a half since int16
takes up half the memory as int32
data, so I did the following:
lst2 = np.arange(1000, dtype = np.int16)
lst2_1 = list(product(enumerate(lst3),
(item for item in product(lst2, lst2) if item[0] < item[1])
))
Surprisingly, the memory size of lst2_1
calculated by sys.getsizeof(lst2_1)
is still 317840928
.
My questions are the following:
1) Is the memory size of the list independent of the datatype of the source data?
2) If so, then what's the best way to reduce the memory size of the list without converting to a generator?
Note that the reason why converting to a generator won't help is because even if it gets converted to a generator, when it is thrown into multiprocessing.Pool().starmap()
, it gets converted back to a list anyway.
You are converting the arrays to Python List
s before you check the size of these arrays.
The integers inside are converted to Python objects. When you do that, it results in a much larger size. Here is an example behavior of your code.
import sys
import numpy as np
lst1 = np.arange(1000)
lst2 = np.arange(1000, dtype = np.int16)
print(sys.getsizeof(lst1)) # 4096
print(sys.getsizeof(lst2)) # 2096
print(sys.getsizeof(list(lst1))) # 9112
print(sys.getsizeof(list(lst2))) # 9112
Numpy
is a C based library, so you can choose which integer type to use (just like int, long, long long). You need your data to stay in C-type so that those advantages can be preserved. That's why Numpy
has so many functions in itself, keeping the operations and the data at a lower level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With