Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting rows from a Dataframe based on values in multiple columns in pandas

Tags:

python

pandas

This question is very related to another, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

But what if I want to pick out all rows that include both 'foo' and 'one'? Here that would be row 0 and 6. My attempt at it is to try

print(df.loc[df['A'] == 'foo' and df['B'] == 'one'])

This does not work, unfortunately. Can anybody suggest a way to implement something like this? Ideally it would be general enough that there could be a more complex set of conditions in there involving and and or, though I don't actually need that for my purposes.

like image 859
Shane Avatar asked Jul 31 '15 22:07

Shane


People also ask

How to select rows in pandas Dataframe based on conditions?

Selecting rows in pandas DataFrame based on conditions Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Selecting those rows whose column value is present in the list using isin() method of the dataframe. Selecting rows based on multiple column conditions using '&' operator.

How to select rows from a Dataframe based on column values?

How to select rows from a dataframe based on column values ? The rows of a dataframe can be selected based on conditions as we do use the SQL queries. The various methods to achieve this is explained in this article with examples. To explain the method a dataset has been created which contains data of points scored by 10 people in various games.

How to select rows based on multiple column conditions in Excel?

Selecting rows based on multiple column conditions using '&' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Age’ is equal to 21 and ‘Stream’ is present in the options list using basic method.

How do I move a column from one Dataframe to another?

This can be achieved in various ways. The query used is Select rows where the column Pid=’p01′ The selected rows are assigned to a new dataframe with the index of rows from old dataframe as an index in the new one and the columns remaining the same.


2 Answers

There is only a very small change needed in your code: change the and with & (and add parentheses for correct ordering of comparisons):

In [104]: df.loc[(df['A'] == 'foo') & (df['B'] == 'one')]
Out[104]:
     A    B  C   D
0  foo  one  0   0
6  foo  one  6  12

The reason you have to use & is that this will do the comparison element-wise on arrays, while and expect to compare two expressions that evaluate to True or False.
Similarly, when you want the or comparison, you can use | in this case.

like image 194
joris Avatar answered Oct 03 '22 00:10

joris


You can do this with tiny altering in your code:

print(df[df['A'] == 'foo'][df['B'] == 'one'])

Output:

     A    B  C   D
0  foo  one  0   0
6  foo  one  6  12
like image 35
Geeocode Avatar answered Oct 03 '22 01:10

Geeocode