I am running the following code:
for i in range(1000)
My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)
The above code is slow. Is there any faster approach?
NumPy Arrays Are Faster Than Lists As predicted, we can see that NumPy arrays are significantly faster than lists.
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations.
Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.
This is basically what is happening in all algorithms based on arrays.
Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).
This is not much of a numpy-specific topic, but much more about data-strucures.
Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:
How are lists implemented?
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
(bold annotations by me)
Maybe creating an empty array with the correct size and than populating it? if you have a list of arrays with same dimensions you could
import numpy as np
arr = np.zeros((len(l),)+l[0].shape)
for i, v in enumerate(l):
arr[i] = v
works much faster for me, it only requires one memory allocation
It depends on what New_Rows[i]
is, and what kind of array do you want. If you start with lists (or 1d arrays) that you want to join end to end (to make a long 1d array) just concatenate them all at once. Concatenate takes a list of any length, not just 2 items.
np.concatenate(New_Rows, axis=0)
or maybe use an intermediate list comprehension (for more flexibility)
np.concatenate([row for row in New_Rows])
or closer to your example.
np.concatenate([New_Rows[i] for i in range(1000)])
But if New_Rows
elements are all the same length, and you want a 2d array, one New_Rows
value per row, np.array
does a nice job:
np.array(New_Rows)
np.array([i for i in New_Rows])
np.array([New_Rows[i] for i in range(1000)])
np.array
is designed primarily to build an array from a list of lists.
np.concatenate
can also build in 2d, but the inputs need to be 2d to start with. vstack
and stack
can take care of that. But all those stack
functions use some sort of list comprehension followed by concatenate
.
In general it is better/faster to iterate or append with lists, and apply the np.array
(or concatenate) just once. appending
to a list is fast; much faster than making a new array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With