Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy/repeat an array N times into a new array? [duplicate]

I have:

test = np.random.randn(40,40,3)

And I want to make:

result = Repeat(test, 10)

So that result contains the array test repeated 10 times, with shape:

(10, 40, 40, 3)

So create a tensor with a new axis to hold 10 copies of test. I also want to do this as efficiently as possible. How can I do this with Numpy?

like image 569
JDS Avatar asked Jun 14 '18 20:06

JDS


People also ask

How do you repeat an array?

The repeat() function is used to repeat elements of an array. Input array. The number of repetitions for each element. repeats is broadcasted to fit the shape of the given axis.

How do you repeat an array and time?

In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.

How does NP repeat work?

The NumPy repeat function essentially repeats the numbers inside of an array. It repeats the individual elements of an array. Having said that, the behavior of NumPy repeat is a little hard to understand sometimes.


3 Answers

One can use np.repeat methods together with np.newaxis:

import numpy as np

test = np.random.randn(40,40,3)
result = np.repeat(test[np.newaxis,...], 10, axis=0)
print(result.shape)
>> (10, 40, 40, 3)
like image 73
OddNorg Avatar answered Oct 21 '22 18:10

OddNorg


Assuming you're looking to copy the values 10 times, you can just stack 10 of the array:

def repeat(arr, count):
    return np.stack([arr for _ in range(count)], axis=0)

axis=0 is actually the default, so it's not really necessary here, but I think it makes it clearer that you're adding the new axis on the front.


In fact, this is pretty much identical to what the examples for stack are doing:

>>> arrays = [np.random.randn(3, 4) for _ in range(10)]
>>> np.stack(arrays, axis=0).shape
(10, 3, 4)

At first glance you might think repeat or tile would be a better fit.

But repeat is about repeating over an existing axis (or flattening the array), so you'd need to reshape either before or after. (Which is just as efficient, but I think not as simple.)

And tile (assuming you use an array-like reps—with scalar reps it basically repeat) is about filling out a multidimensional spec in all directions, which is much more complex than what you want for this simple case.


All of these options will be equally efficient. They all copy the data 10 times over, which is the expensive part; the cost of any internal processing, building tiny intermediate objects, etc. is irrelevant. The only way to make it faster is to avoid copying. Which you probably don't want to do.

But if you do… To share row storage across the 10 copies, you probably want broadcast_to:

def repeat(arr, count):
    return np.broadcast_to(arr, (count,)+arr.shape)

Notice that broadcast_to doesn't actually guarantee that it avoids copying, just that it returns some kind of readonly view where "more than one element of a broadcasted array may refer to a single memory location". In practice, it's going to avoid copying. If you actually need that to be guaranteed for some reason (or if you want a writable view—which is usually going to be a terrible idea, but maybe you have a good reason…), you have to drop down to as_strided:

def repeat(arr, count):
    shape = (count,) + arr.shape
    strides = (0,) + arr.strides
    return np.lib.stride_tricks.as_strided(
        arr, shape=shape, strides=strides, writeable=False)

Notice that half the docs for as_strided are warning that you probably shouldn't use it, and the other half are warning that you definitely shouldn't use it for writable views, so… make sure this is what you want before doing it.

like image 31
abarnert Avatar answered Oct 21 '22 17:10

abarnert


Of the many ways of creating a proper copy, preallocation + broadcasting seems fastest.

import numpy as np

def f_pp_0():
    out = np.empty((10, *a.shape), a.dtype)
    out[...] = a
    return out

def f_pp_1():
    out = np.empty((10, *a.shape), a.dtype)
    np.copyto(out, a)
    return out

def f_oddn():
    return np.repeat(a[np.newaxis,...], 10, axis=0)

def f_abar():
    return np.stack([a for _ in range(10)], axis=0)

def f_arry():
    return np.array(10*[a])

from timeit import timeit

a = np.random.random((40, 40, 3))

for f in list(locals().values()):
    if callable(f) and f.__name__.startswith('f_'):
        print(f.__name__, timeit(f, number=100000)/100, 'ms')

Sample run:

f_pp_0 0.019641224660445003 ms
f_pp_1 0.019557840081397444 ms
f_oddn 0.01983011547010392 ms
f_abar 0.03257150553865358 ms
f_arry 0.02305851033888757 ms

But differences are small, for example repeat is hardly slower if at all.

like image 23
Paul Panzer Avatar answered Oct 21 '22 18:10

Paul Panzer