Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is converting a long 2D list to numpy array so slow?

I have a long list of xy coordinates, and would like to convert it into numpy array.

>>> import numpy as np >>> xy = np.random.rand(1000000, 2).tolist() 

The obvious way would be:

>>> a = np.array(xy) # Very slow... 

However, the above code is unreasonably slow. Interestingly, to transpose the long list first, convert it into numpy array, and then transpose back would be much faster (20x on my laptop).

>>> def longlist2array(longlist): ...     wide = [[row[c] for row in longlist] for c in range(len(longlist[0]))] ...     return np.array(wide).T >>> a = longlist2array(xy) # 20x faster! 

Is this a bug of numpy?

EDIT:

This is a list of points (with xy coordinates) generated on-the-fly, so instead of preallocating an array and enlarging it when necessary, or maintaining two 1D lists for x and y, I think current representation is most natural.

Why is looping through 2nd index faster than 1st index, given that we are iterating through a python list in both directions?

EDIT 2:

Based on @tiago's answer and this question, I found the following code twice as fast as my original version:

>>> from itertools import chain >>> def longlist2array(longlist): ...     flat = np.fromiter(chain.from_iterable(longlist), np.array(longlist[0][0]).dtype, -1) # Without intermediate list:) ...     return flat.reshape((len(longlist), -1)) 
like image 429
herrlich10 Avatar asked Jul 31 '13 14:07

herrlich10


People also ask

Is NumPy array slower than list?

NumPy Arrays Are Faster Than Lists The array is randomly generated. As predicted, we can see that NumPy arrays are significantly faster than lists.

Which is faster NumPy array or list?

As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.

Why NumPy arrays are faster than lists in Python?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.

Is NumPy array slow?

NumPy random for generating an array of random numbers ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.


2 Answers

Implementing this in Cython without the extra checking involved to determine dimensionality, etc. nearly eliminates the time difference you are seeing. Here's the .pyx file I used to verify that.

from numpy cimport ndarray as ar import numpy as np cimport cython  @cython.boundscheck(False) @cython.wraparound(False) def toarr(xy):     cdef int i, j, h=len(xy), w=len(xy[0])     cdef ar[double,ndim=2] new = np.empty((h,w))     for i in xrange(h):         for j in xrange(w):             new[i,j] = xy[i][j]     return new 

I would assume that the extra time is spent in checking the length and content of each sublist in order to determine the datatype, dimension, and size of the desired array. When there are only two sublists, it only has to check two lengths to determine the number of columns in the array, instead of checking 1000000 of them.

like image 62
IanH Avatar answered Sep 23 '22 10:09

IanH


This is because the fastest-varying index of your list is the last one, so np.array() has to traverse the array many times because the first index is much larger. If your list was transposed, np.array() would be faster than your longlist2array:

In [65]: import numpy as np  In [66]: xy = np.random.rand(10000, 2).tolist()  In [67]: %timeit longlist2array(xy) 100 loops, best of 3: 3.38 ms per loop  In [68]: %timeit np.array(xy) 10 loops, best of 3: 55.8 ms per loop  In [69]: xy = np.random.rand(2, 10000).tolist()  In [70]: %timeit longlist2array(xy) 10 loops, best of 3: 59.8 ms per loop  In [71]: %timeit np.array(xy) 1000 loops, best of 3: 1.96 ms per loop 

There is no magical solution for your problem. It's just how Python stores your list in memory. Do you really need to have a list with that shape? Can't you reverse it? (And do you really need a list, given that you're converting to numpy?)

If you must convert a list, this function is about 10% faster than your longlist2array:

from itertools import chain  def convertlist(longlist)     tmp = list(chain.from_iterable(longlist))     return np.array(tmp).reshape((len(longlist), len(longlist[0]))) 
like image 42
tiago Avatar answered Sep 22 '22 10:09

tiago