Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for fancy indexing a numpy array along multiple axes

I'm trying to optimize an algorithm to reduce memory usage, and I've identified this particular operation as a pain point.

I have a symmetric matrix, an index array along the rows, and another index array along the columns (which is just all values that I wasn't selecting in the row index). I feel like I should just be able to pass in both indexes at the same time, but I find myself being forced to select along one axis and then the other, which is causing some memory issues because I don't actually need the copy of the array that's returned, just statistics I'm calculating from it. Here's what I am trying to do:

from scipy.spatial.distance import pdist, squareform
from sklearn import datasets
import numpy as np

iris = datasets.load_iris().data

dx = pdist(iris)
mat = squareform(dx)

outliers = [41,62,106,108,109,134,135]
inliers = np.setdiff1d( range(iris.shape[0]), outliers)

# What I want to be able to do:
scores = mat[inliers, outliers].min(axis=0)

Here's what I'm actually doing to make this work:

# What I'm being forced to do:
s1 = mat[:,outliers]
scores = s1[inliers,:].min(axis=0)

Because I'm fancy indexing, s1 is a new array instead of a view. I only need this array for one operation, so if I could eliminate returning a copy here or at least make the new array smaller (i.e. by respecting the second fancy index selection while I'm doing the first one instead of two separate fancy index operations) that would be preferable.

like image 473
David Marx Avatar asked Jan 10 '23 13:01

David Marx


2 Answers

"Broadcasting" applies to indexing. You could convert inliers into column matrix (e.g. inliers.reshape(-1,1) or inliers[:, np.newaxis], so it has shape (m,1)) and index mat with that in the first column:

s1 = mat[inliers.reshape(-1,1), outliers]
scores = s1.min(axis=0)
like image 73
Warren Weckesser Avatar answered Jan 27 '23 15:01

Warren Weckesser


There's a better way in terms of readability:

result = mat[np.ix_(inliers, outliers)].min(0)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html#numpy.ix_

like image 34
alyaxey Avatar answered Jan 27 '23 15:01

alyaxey