Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas equivalent for grep

I'm new comer for pandas, For dataframe like:

N  Chem    Val
A  Sodium  9
B  Sodium  10
A  Chlorid 7
B  Chlorid 10
A  Sodium  17

I'd like to do like grepin bash, to select lines containing 'A' in 1st column and 'Sodium' in 3rd column :

A  Sodium  9
A  Sodium  17

How should I do? I guess I need to use df[].str.contains()? Thx

like image 680
LookIntoEast Avatar asked Apr 08 '17 01:04

LookIntoEast


3 Answers

You can use .str.contains() on a column of the data frame to return a boolean Series. You can also perform logical and and or operations on multiple Series. Finally, passing a logical Series as a key to a data frame will return only the values that are true.

bool1 = df.N.str.contains('A')          # True for rows of N == 'A'
bool2 = df.Chem.str.contains('Sodium')  # True for rows of Chem == 'Sodium'
df[bool1 & bool2]   # selects rows where N=='A' AND Chem=='Sodium'

returns (without including the index):
N  Chem    Val
A  Sodium  9
A  Sodium  17
like image 111
James Avatar answered Sep 27 '22 23:09

James


In my opinion, using query is the most natural way to express this type of command

df.query('N == "A" & Chem == "Sodium"')

   N    Chem  Val
0  A  Sodium    9
4  A  Sodium   17
like image 30
piRSquared Avatar answered Sep 28 '22 01:09

piRSquared


If you have meant just selecting keys based on both columns, it's better not to use contains. It is for the case when you have to select sodium_A, sodium_B, etc. out of other strings(which means it could be slower than basic multiple selection).

import pandas as pd

# Your sample data
df = pd.read_table('sample.txt', header=None, delim_whitespace=True)

print(df[(df.loc[:, 0] == 'A') & (df.loc[:, 1] == 'Sodium')])

   0       1   2
1  A  Sodium   9
5  A  Sodium  17
like image 23
su79eu7k Avatar answered Sep 28 '22 00:09

su79eu7k