Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Select certain rows (condition met), but only some columns in Python/Numpy


I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:

I = A[A[:,1] == i] 

which works. Then I further tried (similarly to matlab which I know very well):

I = A[A[:,1] == i, [0,2,3]] 

which doesn't work. How to do it?


 >>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])  >>> print A  [[1 2 3 4]   [6 1 3 4]   [3 2 5 6]]  >>> i = 2        # I want to get the columns 1, 3 and 4   # for every row which has the value i in the second column.   # In this case, this would be row 1 and 3 with columns 1, 3 and 4:  [[1 3 4]   [3 5 6]]   

I am now currently using this:

I = A[A[:,1] == i] I = I[:, [0,2,3]] 

But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)

like image 847
tim Avatar asked May 28 '14 12:05


People also ask

How do you drop rows that contain a missing value from a NumPy array?

To remove rows containing missing values, use any() method that returns True if there is at least one True in ndarray . With the argument axis=1 , any() tests whether there is at least one True for each row. Use the negation operator ~ to make rows with no missing values True .

How do you extract items that satisfy a given condition from 1d array?

If you want to extract elements that meet the condition, you can use ndarray[conditional expression] . Even if the original ndarray is a multidimensional array, a flattened one-dimensional array is returned. A new ndarray is returned, and the original ndarray is unchanged.

How do I slice columns in NumPy?

Slice Two-dimensional Numpy Arrays To slice elements from two-dimensional arrays, you need to specify both a row index and a column index as [row_index, column_index] . For example, you can use the index [1,2] to query the element at the second row, third column in precip_2002_2013 .

1 Answers

>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]]) >>> a array([[ 1,  2,  3,  4],        [ 5,  6,  7,  8],        [ 9, 10, 11, 12]])  >>> a[a[:,0] > 3] # select rows where first column is greater than 3 array([[ 5,  6,  7,  8],        [ 9, 10, 11, 12]])  >>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns array([[ 5,  6,  8],        [ 9, 10, 12]])  # fancier equivalent of the previous >>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))] array([[ 5,  6,  8],        [ 9, 10, 12]]) 

For an explanation of the obscure np.ix_(), see https://stackoverflow.com/a/13599843/4323

Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:

>>> a[np.ix_(a[:,0] > 3, (0,1,3))] array([[ 5,  6,  8],        [ 9, 10, 12]]) 
like image 110
John Zwinck Avatar answered Oct 13 '22 08:10

John Zwinck