I am concatenating data to a numpy array like this:
xdata_test = np.concatenate((xdata_test,additional_X))
This is done a thousand times. The arrays have dtype float32
, and their sizes are shown below:
xdata_test.shape : (x1,40,24,24) (x1 : [500~10500])
additional_X.shape : (x2,40,24,24) (x2 : [0 ~ 500])
The problem is that when x1
is larger than ~2000-3000, the concatenation takes a lot longer.
The graph below plots the concatenation time versus the size of the x2
dimension:
Is this a memory issue or a basic characteristic of numpy?
As far as I understand numpy, all the stack
and concatenate
functions are not extremely efficient. And for good reasons, because numpy tries to keep array memory contiguous for efficiency (see this link about contiguous arrays in numpy)
That means that every concatenate operation have to copy the whole data every time. When I need to concatenate a bunch of elements together I tend to do this :
l = []
for additional_X in ...:
l.append(addiional_X)
xdata_test = np.concatenate(l)
That way, the costly operation of moving the whole data is only done once.
NB : would be interested in the speed improvement that gives you.
If you have in advance the arrays you want to concatenate, I would suggest creating a new array with the total shape and fill it with the small arrays rather than concatenating, as every concatenation operation needs to copy the whole data to a new contiguous space of memory.
First, calculate the total size of the first axis:
max_x = 0
for arr in list_of_arrays:
max_x += arr.shape[0]
Second, create the end container:
final_data = np.empty((max_x,) + xdata_test.shape[1:], dtype=xdata_test.dtype)
which is equivalent to (max_x, 40, 24, 24)
but dynamically typed.
Last, fill the numpy array:
curr_x = 0
for arr in list_of_arrays:
final_data[curr_x:curr_x+arr.shape[0]] = arr
curr_x += arr.shape[0]
The loop above, copies each of the arrays to a previously defined column/rows of the larger array.
By doing this, each of the N
arrays will be copied to the exact final destination, rather than creating temporal arrays for each of the concatenation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With