I'm new comer for pandas, For dataframe like:
N  Chem    Val
A  Sodium  9
B  Sodium  10
A  Chlorid 7
B  Chlorid 10
A  Sodium  17
I'd like to do like grepin bash, to select lines containing 'A' in 1st column and 'Sodium' in 3rd column :
A  Sodium  9
A  Sodium  17
How should I do? I guess I need to use df[].str.contains()?
Thx
You can use .str.contains() on a column of the data frame to return a boolean Series.  You can also perform logical and and or operations on multiple Series.  Finally, passing a logical Series as a key to a data frame will return only the values that are true.
bool1 = df.N.str.contains('A')          # True for rows of N == 'A'
bool2 = df.Chem.str.contains('Sodium')  # True for rows of Chem == 'Sodium'
df[bool1 & bool2]   # selects rows where N=='A' AND Chem=='Sodium'
returns (without including the index):
N  Chem    Val
A  Sodium  9
A  Sodium  17
                        In my opinion, using query is the most natural way to express this type of command
df.query('N == "A" & Chem == "Sodium"')
   N    Chem  Val
0  A  Sodium    9
4  A  Sodium   17
                        If you have meant just selecting keys based on both columns, it's better not to use contains. It is for the case when you have to select sodium_A, sodium_B, etc. out of other strings(which means it could be slower than basic multiple selection).
import pandas as pd
# Your sample data
df = pd.read_table('sample.txt', header=None, delim_whitespace=True)
print(df[(df.loc[:, 0] == 'A') & (df.loc[:, 1] == 'Sodium')])
   0       1   2
1  A  Sodium   9
5  A  Sodium  17
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With