Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.tile a non-integer number of times

Is there a better way in numpy to tile an array a non-integer number of times? This gets the job done, but is clunky and doesn't easily generalize to n-dimensions:

import numpy as np
arr = np.arange(6).reshape((2, 3))
desired_shape = (5, 8)
reps = tuple([x // y for x, y in zip(desired_shape, arr.shape)])
left = tuple([x % y for x, y in zip(desired_shape, arr.shape)])
tmp = np.tile(arr, reps)
tmp = np.r_[tmp, tmp[slice(left[0]), :]]
tmp = np.c_[tmp, tmp[:, slice(left[1])]]

this yields:

array([[0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1]])

EDIT: Performance results

Some test of the three answers that were generalized to n-dimensions. These definitions were put in a file newtile.py:

import numpy as np

def tile_pad(a, dims):
    return np.pad(a, tuple((0, i) for i in (np.array(dims) - a.shape)),
                  mode='wrap')

def tile_meshgrid(a, dims):
    return a[np.meshgrid(*[np.arange(j) % k for j, k in zip(dims, a.shape)],
                         sparse=True, indexing='ij')]

def tile_rav_mult_idx(a, dims):
    return a.flat[np.ravel_multi_index(np.indices(dims), a.shape, mode='wrap')]

Here are the bash lines:

python -m timeit -s 'import numpy as np' 'import newtile' 'newtile.tile_pad(np.arange(30).reshape(2, 3, 5), (3, 5, 7))'
python -m timeit -s 'import numpy as np' 'import newtile' 'newtile.tile_meshgrid(np.arange(30).reshape(2, 3, 5), (3, 5, 7))'
python -m timeit -s 'import numpy as np' 'import newtile' 'newtile.tile_rav_mult_idx(np.arange(30).reshape(2, 3, 5), (3, 5, 7))'

python -m timeit -s 'import numpy as np' 'import newtile' 'newtile.tile_pad(np.arange(2310).reshape(2, 3, 5, 7, 11), (13, 17, 19, 23, 29))'
python -m timeit -s 'import numpy as np' 'import newtile' 'newtile.tile_meshgrid(np.arange(2310).reshape(2, 3, 5, 7, 11), (13, 17, 19, 23, 29))'
python -m timeit -s 'import numpy as np' 'import newtile' 'newtile.tile_rav_mult_idx(np.arange(2310).reshape(2, 3, 5, 7, 11), (13, 17, 19, 23, 29))'

Here are the results with small arrays (2 x 3 x 5):

pad:               10000 loops, best of 3: 106 usec per loop
meshgrid:          10000 loops, best of 3: 56.4 usec per loop
ravel_multi_index: 10000 loops, best of 3: 50.2 usec per loop

Here are the results with larger arrays (2 x 3 x 5 x 7 x 11):

pad:               10 loops, best of 3: 25.2 msec per loop
meshgrid:          10 loops, best of 3: 300 msec per loop
ravel_multi_index: 10 loops, best of 3: 218 msec per loop

So the method using np.pad is probably the most performant choice.

like image 380
drammock Avatar asked Oct 15 '14 04:10

drammock


2 Answers

Another solution which is even more concise:

arr = np.arange(6).reshape((2, 3))
desired_shape = np.array((5, 8))

pads = tuple((0, i) for i in (desired_shape-arr.shape))
# pads = ((0, add_rows), (0, add_columns), ...)
np.pad(arr, pads, mode="wrap")

but it is slower for small arrays (much faster for large ones though). Strangely, np.pad won't accept np.array for pads.

like image 54
panda-34 Avatar answered Oct 03 '22 09:10

panda-34


Here's a pretty concise method:

In [57]: a
Out[57]: 
array([[0, 1, 2],
       [3, 4, 5]])

In [58]: old = a.shape

In [59]: new = (5, 8)

In [60]: a[(np.arange(new[0]) % old[0])[:,None], np.arange(new[1]) % old[1]]
Out[60]: 
array([[0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1]])

Here's an n-dimensional generalization:

def rep_shape(a, shape):
    indices = np.meshgrid(*[np.arange(k) % j for j, k in zip(a.shape, shape)],
                          sparse=True, indexing='ij')
    return a[indices]

For example:

In [89]: a
Out[89]: 
array([[0, 1, 2],
       [3, 4, 5]])

In [90]: rep_shape(a, (5, 8))
Out[90]: 
array([[0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1]])

In [91]: rep_shape(a, (4, 2))
Out[91]: 
array([[0, 1],
       [3, 4],
       [0, 1],
       [3, 4]])

In [92]: b = np.arange(24).reshape(2,3,4)

In [93]: b
Out[93]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [94]: rep_shape(b, (3,4,5))
Out[94]: 
array([[[ 0,  1,  2,  3,  0],
        [ 4,  5,  6,  7,  4],
        [ 8,  9, 10, 11,  8],
        [ 0,  1,  2,  3,  0]],

       [[12, 13, 14, 15, 12],
        [16, 17, 18, 19, 16],
        [20, 21, 22, 23, 20],
        [12, 13, 14, 15, 12]],

       [[ 0,  1,  2,  3,  0],
        [ 4,  5,  6,  7,  4],
        [ 8,  9, 10, 11,  8],
        [ 0,  1,  2,  3,  0]]])

Here's how the first example works...

The idea is to use arrays to index a. Take a look at np.arange(new[0] % old[0]):

In [61]: np.arange(new[0]) % old[0]
Out[61]: array([0, 1, 0, 1, 0])

Each value in that array gives the row of a to use in the result. Similary,

In [62]: np.arange(new[1]) % old[1]
Out[62]: array([0, 1, 2, 0, 1, 2, 0, 1])

gives the columns of a to use in the result. For these index arrays to create a 2-d result, we have to reshape the first one into a column:

In [63]: (np.arange(new[0]) % old[0])[:,None]
Out[63]: 
array([[0],
       [1],
       [0],
       [1],
       [0]])

When arrays are used as indices, they broadcast. Here's what the broadcast indices look like:

n [65]: i, j = np.broadcast_arrays((np.arange(new[0]) % old[0])[:,None], np.arange(new[1]) % old[1])

In [66]: i
Out[66]: 
array([[0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [67]: j
Out[67]: 
array([[0, 1, 2, 0, 1, 2, 0, 1],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [0, 1, 2, 0, 1, 2, 0, 1]])

These are the index array that we need to generate the array with shape (5, 8):

In [68]: a[i,j]
Out[68]: 
array([[0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1],
       [3, 4, 5, 3, 4, 5, 3, 4],
       [0, 1, 2, 0, 1, 2, 0, 1]])

When index arrays are given as in the example at the beginning (i.e. using (np.arange(new[0]) % old[0])[:,None] in the first index slot), numpy doesn't actually generate these index arrays in memory like I did with i and j. i and j show the effective contents when broadcasting occurs.

The function rep_shape does the same thing, using np.meshgrid to generate the index arrays for each "slot" with the correct shapes for broadcasting.

like image 23
Warren Weckesser Avatar answered Oct 03 '22 08:10

Warren Weckesser