This question is very related to another, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14
print(df.loc[df['A'] == 'foo'])
yields
A B C D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14
But what if I want to pick out all rows that include both 'foo' and 'one'? Here that would be row 0 and 6. My attempt at it is to try
print(df.loc[df['A'] == 'foo' and df['B'] == 'one'])
This does not work, unfortunately. Can anybody suggest a way to implement something like this? Ideally it would be general enough that there could be a more complex set of conditions in there involving and
and or
, though I don't actually need that for my purposes.
Selecting rows in pandas DataFrame based on conditions Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Selecting those rows whose column value is present in the list using isin() method of the dataframe. Selecting rows based on multiple column conditions using '&' operator.
How to select rows from a dataframe based on column values ? The rows of a dataframe can be selected based on conditions as we do use the SQL queries. The various methods to achieve this is explained in this article with examples. To explain the method a dataset has been created which contains data of points scored by 10 people in various games.
Selecting rows based on multiple column conditions using '&' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Age’ is equal to 21 and ‘Stream’ is present in the options list using basic method.
This can be achieved in various ways. The query used is Select rows where the column Pid=’p01′ The selected rows are assigned to a new dataframe with the index of rows from old dataframe as an index in the new one and the columns remaining the same.
There is only a very small change needed in your code: change the and
with &
(and add parentheses for correct ordering of comparisons):
In [104]: df.loc[(df['A'] == 'foo') & (df['B'] == 'one')]
Out[104]:
A B C D
0 foo one 0 0
6 foo one 6 12
The reason you have to use &
is that this will do the comparison element-wise on arrays, while and
expect to compare two expressions that evaluate to True or False.
Similarly, when you want the or
comparison, you can use |
in this case.
You can do this with tiny altering in your code:
print(df[df['A'] == 'foo'][df['B'] == 'one'])
Output:
A B C D
0 foo one 0 0
6 foo one 6 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With