Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Get index of multiple rows which column matches certain value

Given a DataFrame with the columns xk and yk, we want to find the indexes of the DataFrame in which the values for xk and yk ==0.

I have it working perfectly fine for just the one column but I cant get it working for both

b = (df[df['xk'] ==0]).index.tolist()

How would I do it for xk and yk at the same time.

like image 648
WGP Avatar asked Aug 02 '16 14:08

WGP


People also ask

How do you count occurrences of specific value in pandas column?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do you select a specific value from a pandas series?

Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.

How do I find similar columns in pandas?

To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.


1 Answers

I think you can check if all values are True in compared subset ['xk', 'yk'] by all:

b = df[(df[['xk', 'yk']] == 0).all(1)].index.tolist()

Another solution is add second condition with &:

b = (df[(df['xk']  == 0) & (df['yk'] == 0)].index.tolist())

Sample:

df = pd.DataFrame({'xk':[0,2,3],
                   'yk':[0,5,0],
                   'aa':[0,1,0]})

print (df)
   aa  xk  yk
0   0   0   0
1   1   2   5
2   0   3   0

b = df[(df[['xk', 'yk']] == 0).all(1)].index.tolist()
print (b)
[0]

b1 = (df[(df['xk']  == 0) & (df['yk'] == 0)].index.tolist())
print (b1)
[0]

Second solution is faster:

#length of df = 3k
df = pd.concat([df]*1000).reset_index(drop=True)

In [294]: %timeit df[(df[['xk', 'yk']] == 0).all(1)].index.tolist()
1000 loops, best of 3: 1.21 ms per loop

In [295]: %timeit (df[(df['xk']  == 0) & (df['yk'] == 0)].index.tolist())
1000 loops, best of 3: 828 µs per loop
like image 134
jezrael Avatar answered Nov 14 '22 23:11

jezrael