I have: <pre class="prettyprint"><code>test = np.random.randn(40,40,3) </code></pre> And I want to make: <pre class="prettyprint"><code>result = Repeat(test, 10) </code></pre> So that <code>result</code> contains the array <code>test</code> repeated 10 times, with shape: <pre class="prettyprint"><code>(10, 40, 40, 3) </code></pre> So create a tensor with a new axis to hold 10 copies of <code>test</code>. I also want to do this as efficiently as possible. How can I do this with Numpy?

One can use <code>np.repeat</code> methods together with <code>np.newaxis</code>: <pre class="prettyprint"><code>import numpy as np test = np.random.randn(40,40,3) result = np.repeat(test[np.newaxis,...], 10, axis=0) print(result.shape) >> (10, 40, 40, 3) </code></pre>

How to copy/repeat an array N times into a new array? [duplicate]

Tags:

python

arrays

numpy

I have:

test = np.random.randn(40,40,3)

And I want to make:

result = Repeat(test, 10)

So that result contains the array test repeated 10 times, with shape:

(10, 40, 40, 3)

So create a tensor with a new axis to hold 10 copies of test. I also want to do this as efficiently as possible. How can I do this with Numpy?

569

asked Jun 14 '18 20:06

JDS

3 Answers

One can use np.repeat methods together with np.newaxis:

import numpy as np

test = np.random.randn(40,40,3)
result = np.repeat(test[np.newaxis,...], 10, axis=0)
print(result.shape)
>> (10, 40, 40, 3)

answered Oct 21 '22 18:10

OddNorg

Assuming you're looking to copy the values 10 times, you can just stack 10 of the array:

def repeat(arr, count):
    return np.stack([arr for _ in range(count)], axis=0)

axis=0 is actually the default, so it's not really necessary here, but I think it makes it clearer that you're adding the new axis on the front.

In fact, this is pretty much identical to what the examples for stack are doing:

>>> arrays = [np.random.randn(3, 4) for _ in range(10)]
>>> np.stack(arrays, axis=0).shape
(10, 3, 4)

At first glance you might think repeat or tile would be a better fit.

But repeat is about repeating over an existing axis (or flattening the array), so you'd need to reshape either before or after. (Which is just as efficient, but I think not as simple.)

And tile (assuming you use an array-like reps—with scalar reps it basically repeat) is about filling out a multidimensional spec in all directions, which is much more complex than what you want for this simple case.

All of these options will be equally efficient. They all copy the data 10 times over, which is the expensive part; the cost of any internal processing, building tiny intermediate objects, etc. is irrelevant. The only way to make it faster is to avoid copying. Which you probably don't want to do.

But if you do… To share row storage across the 10 copies, you probably want broadcast_to:

def repeat(arr, count):
    return np.broadcast_to(arr, (count,)+arr.shape)

Notice that broadcast_to doesn't actually guarantee that it avoids copying, just that it returns some kind of readonly view where "more than one element of a broadcasted array may refer to a single memory location". In practice, it's going to avoid copying. If you actually need that to be guaranteed for some reason (or if you want a writable view—which is usually going to be a terrible idea, but maybe you have a good reason…), you have to drop down to as_strided:

def repeat(arr, count):
    shape = (count,) + arr.shape
    strides = (0,) + arr.strides
    return np.lib.stride_tricks.as_strided(
        arr, shape=shape, strides=strides, writeable=False)

Notice that half the docs for as_strided are warning that you probably shouldn't use it, and the other half are warning that you definitely shouldn't use it for writable views, so… make sure this is what you want before doing it.

answered Oct 21 '22 17:10

abarnert

Of the many ways of creating a proper copy, preallocation + broadcasting seems fastest.

import numpy as np

def f_pp_0():
    out = np.empty((10, *a.shape), a.dtype)
    out[...] = a
    return out

def f_pp_1():
    out = np.empty((10, *a.shape), a.dtype)
    np.copyto(out, a)
    return out

def f_oddn():
    return np.repeat(a[np.newaxis,...], 10, axis=0)

def f_abar():
    return np.stack([a for _ in range(10)], axis=0)

def f_arry():
    return np.array(10*[a])

from timeit import timeit

a = np.random.random((40, 40, 3))

for f in list(locals().values()):
    if callable(f) and f.__name__.startswith('f_'):
        print(f.__name__, timeit(f, number=100000)/100, 'ms')

Sample run:

f_pp_0 0.019641224660445003 ms
f_pp_1 0.019557840081397444 ms
f_oddn 0.01983011547010392 ms
f_abar 0.03257150553865358 ms
f_arry 0.02305851033888757 ms

But differences are small, for example repeat is hardly slower if at all.

answered Oct 21 '22 18:10

Paul Panzer

Related questions
                            
                                max([x for x in something]) vs max(x for x in something): why is there a difference and what is it?
                            
                                How do I write/create a GeoTIFF RGB image file in python?
                            
                                Preventing splitting at apostrophies when tokenizing words using nltk
                            
                                ImportError: No module named pydot ( unable to import pydot)
                            
                                How to use GridSearchCV output for a scikit prediction?
                            
                                Formatting exceptions as Python does
                            
                                psycopg2 : cursor already closed
                            
                                Python: Can a subclass of float take extra arguments in its constructor?
                            
                                Broadcast one channel in Numpy array into three channels
                            
                                Pandas - change the order of levels of factor-type object
                            
                                Write values to a particular cell in a sheet in pandas in python
                            
                                How to convert frozenset to normal sets or list?
                            
                                Create a complement of list preserving duplicate values
                            
                                Creating a "dictionary of sets"
                            
                                OpenCV - How to find rectangle contour of a rectangle with round corner?
                            
                                reading files triggered by s3 event
                            
                                Pandas sort_values does not sort numbers correctly
                            
                                Scale matplotlib.pyplot.Axes.scatter markersize by x-scale
                            
                                removing isolated vertices in networkx
                            
                                ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With