In my current work, I use Numpy and list comprehensions a lot and in the interest of the best possible performance I have the following questions:
What actually happens behind the scenes if I create a Numpy array as follows?
a = numpy.array( [1,2,3,4] )
My guess is that python first creates an ordinary list containing the values, then uses the list size to allocate a numpy array and afterwards copies the values into this new array. Is this correct, or is the interpreter clever enough to realize that the list is only intermediary and instead copy the values directly?
Similarly, if i wish to create a numpy array from list comprehension using numpy.fromiter()
:
a = numpy.fromiter( [ x for x in xrange(0,4) ], int )
will this result in an intermediary list of values being created before being fed into fromiter()
?
Therefore, when working with NumPy, remember that you can also work with List Comprehension.
Even for the delete operation, the Numpy array is faster. As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.
The answer is performance. Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.
List comprehensions on tiny lists are faster than doing the same with numpy as the performance gain from using numpy is not enough to offset the overhead of creating an array.
I believe than answer you are looking for is using generator expressions
with numpy.fromiter.
numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)
Generator expressions are lazy - they evaluate the expression when you iterate through them.
Using list comprehensions makes the list, then feeds it into numpy, while generator expressions will yield one at a time.
Python evaluates things inside -> out, like most languages (if not all), so using [<something> for <something_else> in <something_different>]
would make the list, then iterate over it.
You could create your own list and experiment with it to shed some light on the situation...
>>> class my_list(list):
... def __init__(self, arg):
... print 'spam'
... super(my_list, self).__init__(arg)
... def __len__(self):
... print 'eggs'
... return super(my_list, self).__len__()
...
>>> x = my_list([0,1,2,3])
spam
>>> len(x)
eggs
4
>>> import numpy as np
>>> np.array(x)
eggs
eggs
eggs
eggs
array([0, 1, 2, 3])
>>> np.fromiter(x, int)
array([0, 1, 2, 3])
>>> np.array(my_list([0,1,2,3]))
spam
eggs
eggs
eggs
eggs
array([0, 1, 2, 3])
To the question in the title, there is now a package called numba which supports numpy array comprehension, which directly constructs the numpy array without intermediate python lists. Unlike numpy.fromiter
, it also supports nested comprehension. However, bear in mind that there are some restrictions and performance quirks with numba if you are not familiar with it.
That said, it can be quite fast and efficient, but if you can write it using numpy's vector operations it may be better to keep it simpler.
>>> from timeit import timeit
>>> # using list comprehension
>>> timeit("np.array([i*i for i in range(1000)])", "import numpy as np", number=1000)
2.544344299999999
>>> # using numpy operations
>>> timeit("np.arange(1000) ** 2", "import numpy as np", number=1000)
0.05207519999999022
>>> # using numpy.fromiter
>>> timeit("np.fromiter((i*i for i in range(1000)), dtype=int, count=1000)",
... "import numpy as np",
... number=1000)
1.087984500000175
>>> # using numba array comprehension
>>> timeit("squares(1000)",
... """
... import numpy as np
... import numba as nb
...
... @nb.njit
... def squares(n):
... return np.array([i*i for i in range(n)])
...
... 'compile the function'
... squares(10)
... """,
... number=1000)
0.03716940000003888
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With