I'm new comer for pandas, For dataframe like:
N Chem Val
A Sodium 9
B Sodium 10
A Chlorid 7
B Chlorid 10
A Sodium 17
I'd like to do like grep
in bash, to select lines containing 'A'
in 1st column and 'Sodium'
in 3rd column :
A Sodium 9
A Sodium 17
How should I do? I guess I need to use df[].str.contains()
?
Thx
You can use .str.contains()
on a column of the data frame to return a boolean Series
. You can also perform logical and
and or
operations on multiple Series. Finally, passing a logical Series as a key to a data frame will return only the values that are true.
bool1 = df.N.str.contains('A') # True for rows of N == 'A'
bool2 = df.Chem.str.contains('Sodium') # True for rows of Chem == 'Sodium'
df[bool1 & bool2] # selects rows where N=='A' AND Chem=='Sodium'
returns (without including the index):
N Chem Val
A Sodium 9
A Sodium 17
In my opinion, using query
is the most natural way to express this type of command
df.query('N == "A" & Chem == "Sodium"')
N Chem Val
0 A Sodium 9
4 A Sodium 17
If you have meant just selecting keys based on both columns, it's better not to use contains. It is for the case when you have to select sodium_A, sodium_B, etc. out of other strings(which means it could be slower than basic multiple selection).
import pandas as pd
# Your sample data
df = pd.read_table('sample.txt', header=None, delim_whitespace=True)
print(df[(df.loc[:, 0] == 'A') & (df.loc[:, 1] == 'Sodium')])
0 1 2
1 A Sodium 9
5 A Sodium 17
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With