repeated numpy subarrays




This is a simplification of my question. I have a numpy array:

x = np.array([0,1,2,3])

and I have a function:

def f(y): return y**2

I can compute f(x).

Now suppose I really want to compute f(x) for a repeated x:

x = np.array([0,1,2,3,0,1,2,3,0,1,2,3])

Is there a way to do this without creating a repeated version of x and in a way that is transparent to f?

In my particular case, f is an involved function and one of the arguments is x. I would like to be able to calculate f when x is repeated without actually repeating it as it wont fit into memory.

Rewriting f to handle repeated x would be work and I was hoping for a clever way possibly to subclass a numpy array to do this.

Any tips appreciated.

You can (almost) do this by using a few tricks with strides.

However, there are some major caveats...

import numpy as np
x = np.arange(4)
numrepeats = 3

y = np.lib.stride_tricks.as_strided(x, (numrepeats,)+x.shape, (0,)+x.strides)

print y
x[0] = 9
print y

So, y is now a view into x where each row is x. No new memory is used, and we can make y as large as we like.

For example, I can do this:

import numpy as np
x = np.arange(4)
numrepeats = 1e15

y = np.lib.stride_tricks.as_strided(x, (numrepeats,)+x.shape, (0,)+x.strides)

...and not use any more memory than the 32 bytes required for x. (y would use ~8 Petabytes of ram, otherwise)

However, if we reshape y so that it only has one dimension, we'll get a copy which will use the full amount of memory. There's no way to describe a "horizontally" tiled view of x using strides and shape, so any shape with less than 2 dimensions will return a copy.

Additionally, if we operate on y in a way that would return a copy (e.g. the y**2 in your example), we'll get a full copy.

For that reason, it makes more sense to operate on things in-place. (e.g. y **= 2, or equivalently x **= 2. Both will accomplish the same thing.)

Even for a generic function, you can pass in x and place the result back in x.


def f(x):
    return x**3

x[...] = f(x)
print y

y will be updated, as well, as it's just a view into x.

