Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: create a matrix from a cartesian product of two vectors (or one with itself) while applying a function to all pairs

To give a bit of explanation, I want to create a covariance matrix where each element is defined by a kernel function k(x, y), and I want to do this for a single vector. It should be something like:

# This is given
x = [x1, x2, x3, x4, ...]

# This is what I want to compute
result = [[k(x1, x1), k(x1, x2), k(x1, x3), ...],
          [k(x2, x1), k(x2, x2), ...],
          [k(x3, x1), k(x3, x2), ...],
          ...]

but of course this should be done in numpy arrays, ideally without doing Python interations, because of performance. If I didn't care about performance, I'd probably just write:

result = np.zeros((len(x), len(x)))

for i in range(len(x)):
  for j in range(len(x)):
     result[i, j] = k(x[i], x[j])

But I feel like there must be a more idiomatic way to write this pattern.

like image 887
Jakub Arnold Avatar asked Sep 15 '25 19:09

Jakub Arnold


1 Answers

If k operates on 2D arrays, you could use np.meshgrid. But, this would have extra memory overhead. One alternative would be to create 2D mesh views same as with np.meshgrid, like so -

def meshgrid1D_view(x):
    shp = (len(x),len(x))
    mesh1 = np.broadcast_to(x,shp)
    mesh2 = np.broadcast_to(x[:,None],shp)    
    return mesh1, mesh2

Sample run -

In [140]: x
Out[140]: array([3, 5, 6, 8])

In [141]: np.meshgrid(x,x)
Out[141]: 
[array([[3, 5, 6, 8],
        [3, 5, 6, 8],
        [3, 5, 6, 8],
        [3, 5, 6, 8]]), array([[3, 3, 3, 3],
        [5, 5, 5, 5],
        [6, 6, 6, 6],
        [8, 8, 8, 8]])]
In [142]: meshgrid1D(x)
Out[142]: 
(array([[3, 5, 6, 8],
        [3, 5, 6, 8],
        [3, 5, 6, 8],
        [3, 5, 6, 8]]), array([[3, 3, 3, 3],
        [5, 5, 5, 5],
        [6, 6, 6, 6],
        [8, 8, 8, 8]]))

How does this help?

It helps with memory efficiency and hence performance. Let's test out on large arrays to see the difference -

In [143]: x = np.random.randint(0,10,(10000))

In [144]: %timeit np.meshgrid(x,x)
10 loops, best of 3: 171 ms per loop

In [145]: %timeit meshgrid1D(x)
100000 loops, best of 3: 6.91 µs per loop
like image 72
Divakar Avatar answered Sep 18 '25 09:09

Divakar