Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient creation of numpy arrays from list comprehension and in general

In my current work, I use Numpy and list comprehensions a lot and in the interest of the best possible performance I have the following questions:

What actually happens behind the scenes if I create a Numpy array as follows?

a = numpy.array( [1,2,3,4] )

My guess is that python first creates an ordinary list containing the values, then uses the list size to allocate a numpy array and afterwards copies the values into this new array. Is this correct, or is the interpreter clever enough to realize that the list is only intermediary and instead copy the values directly?

Similarly, if i wish to create a numpy array from list comprehension using numpy.fromiter():

a = numpy.fromiter( [ x for x in xrange(0,4) ], int )

will this result in an intermediary list of values being created before being fed into fromiter()?

like image 660
NielsGM Avatar asked Jan 17 '13 05:01

NielsGM


People also ask

Can you use list comprehension with NumPy arrays?

Therefore, when working with NumPy, remember that you can also work with List Comprehension.

Which is more efficient a Python list or a NumPy array?

Even for the delete operation, the Numpy array is faster. As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.

Are NumPy arrays more efficient than lists?

The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.

Is list comprehension faster than NumPy?

List comprehensions on tiny lists are faster than doing the same with numpy as the performance gain from using numpy is not enough to offset the overhead of creating an array.


3 Answers

I believe than answer you are looking for is using generator expressions with numpy.fromiter.

numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)

Generator expressions are lazy - they evaluate the expression when you iterate through them.

Using list comprehensions makes the list, then feeds it into numpy, while generator expressions will yield one at a time.

Python evaluates things inside -> out, like most languages (if not all), so using [<something> for <something_else> in <something_different>] would make the list, then iterate over it.

like image 113
Snakes and Coffee Avatar answered Sep 26 '22 10:09

Snakes and Coffee


You could create your own list and experiment with it to shed some light on the situation...

>>> class my_list(list):
...     def __init__(self, arg):
...         print 'spam'
...         super(my_list, self).__init__(arg)
...   def __len__(self):
...       print 'eggs'
...       return super(my_list, self).__len__()
... 
>>> x = my_list([0,1,2,3])
spam
>>> len(x)
eggs
4
>>> import numpy as np
>>> np.array(x)
eggs
eggs
eggs
eggs
array([0, 1, 2, 3])
>>> np.fromiter(x, int)
array([0, 1, 2, 3])
>>> np.array(my_list([0,1,2,3]))
spam
eggs
eggs
eggs
eggs
array([0, 1, 2, 3])
like image 20
wim Avatar answered Sep 26 '22 10:09

wim


To the question in the title, there is now a package called numba which supports numpy array comprehension, which directly constructs the numpy array without intermediate python lists. Unlike numpy.fromiter, it also supports nested comprehension. However, bear in mind that there are some restrictions and performance quirks with numba if you are not familiar with it.

That said, it can be quite fast and efficient, but if you can write it using numpy's vector operations it may be better to keep it simpler.

>>> from timeit import timeit
>>> # using list comprehension
>>> timeit("np.array([i*i for i in range(1000)])", "import numpy as np", number=1000)
2.544344299999999
>>> # using numpy operations
>>> timeit("np.arange(1000) ** 2", "import numpy as np", number=1000)
0.05207519999999022
>>> # using numpy.fromiter
>>> timeit("np.fromiter((i*i for i in range(1000)), dtype=int, count=1000)",
...        "import numpy as np",
...        number=1000)
1.087984500000175
>>> # using numba array comprehension
>>> timeit("squares(1000)",
... """
... import numpy as np
... import numba as nb
... 
... @nb.njit
... def squares(n):
...     return np.array([i*i for i in range(n)])
... 
... 'compile the function'
... squares(10)
... """,
... number=1000)
0.03716940000003888
like image 29
Simply Beautiful Art Avatar answered Sep 24 '22 10:09

Simply Beautiful Art