Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting Python List to Numpy Array InPlace

I have a python list that's huge (16 GB) and I want to convert it to numpy array, inplace. I can't afford this statement

huge_array = np.array(huge_list).astype(np.float16)

I'm looking for some efficient ways to transform this huge_list into numpy array without making it's copy.

Can anyone suggest an efficient method to do this? that might involve saving the list to disk first and then loading it as numpy array, I'm ok with that.

I'll highly appreciate any help.

EDIT 1 : huge_list is an in memory python list that's created on runtime so it's already taking 16GB. I need to convert it to numpy float16 array.

like image 610
Ahmed Avatar asked Feb 11 '26 10:02

Ahmed


2 Answers

np.array(huge_list, dtype=np.float16) will be faster, since it only copies the list once and not twice


You probably don't need to worry about making this copy, because the copy is a lot smaller than the original:

>>> x = [float(i) for i in range(10000)]
>>> sys.getsizeof(x)
83112
>>> y = np.array(x, dtype=np.float16)
>>> sys.getsizeof(y)
20096

But that's not even the worst of it - with the python list, each number in the list is taking up memory of its own:

>>> sum(sys.getsizeof(i) for i in x)
240000

So the numpy array is ~15x smaller!

like image 77
Eric Avatar answered Feb 14 '26 00:02

Eric


As I previously mentioned, the easiest would be to just dump the array to a file and then load that file as a numpy array.

First we need the size of the huge list:

huge_list_size = len(huge_list)

Next we dump it to disk

dumpfile = open('huge_array.txt', 'w')

for item in huge_list:
    dumpfile.write(str(item)+"\n")
dumpfile.close()

Ensure we clear the memory if this all happens in the same environment

del huge_list

Next we define a simple read generator

def read_file_generator(filename):
    with open(filename) as infile:
        for i, line in enumerate(infile):
            yield [i, line]

And then we create a numpy array of zeros, which we fill with the generator we just created

huge_array = np.zeros(huge_list_size, dtype='float16')

for i, item in read_file_generator('huge_array.txt'):
    huge_array[i] = item

My previous answer was incorrect. I suggested the following to be a solution, which it is not as commented by hpaulj

You can do this in a multiple ways, the easiest would be to just dump the array to a file and then load that file as a numpy array:

dumpfile = open('huge_array.txt', 'w')

for item in huge_array:
  print>>dumpfile, item

Then load it as a numpy array

huge_array = numpy.loadtxt('huge_array.txt')

If you want to perform further computations on this data you can also use the joblib library for memmapping, which is extremely usefull in handling large numpy array cmputations. Available at https://pypi.python.org/pypi/joblib

like image 36
Laurens Avatar answered Feb 14 '26 00:02

Laurens



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!