Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows from a pandas dataframe where two columns match list of pairs

Tags:

python

pandas

I'm trying to create a boolean mask (or list of indices) from a dataframe to indicate where multiple columns match some combinations in a list. Here's an example:

import pandas as pd
df = pd.DataFrame({'A': ['alice', 'bob'    , 'charlie'  , 'dave' , 'dave'], 
                   'B': ['andy' , 'bridget', 'charlotte', 'diana', 'andy'], 
                   'C': ['some' , 'other'  , 'stuff'    , 'here' , '!' ]})

pairs = pd.DataFrame({'A': ['alice', 'dave'],
                      'B': ['andy' , 'diana']})

My desired output is either

[True, False, False, True, False]

or

[0, 3]

Critically, I don't want to return row index 4 i.e. ['dave', 'andy', '!']. I can achieve what I want by converting back to lists...but this feels like a long way round and I imagine there's a 'pandas' way to do this!

df_list = df[['A', 'B']].values.tolist()
pairs_list = pairs.values.tolist()
[idx for idx, row in enumerate(df_list) if row in pairs_list]
like image 922
James Owers Avatar asked Jul 19 '16 10:07

James Owers


People also ask

How do I select rows from a DataFrame based on multiple column values?

You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.

How do you select rows of pandas DataFrame based on values in a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.

How can pandas select rows based on multiple conditions?

To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.


1 Answers

You could perform a outer type merge with indicator=True param and test whether _merge column == 'both':

In [97]:
merged = df.merge(pairs, how='outer', indicator=True)
merged[merged['_merge'] =='both'].index

Out[97]:
Int64Index([0, 3], dtype='int64')

To get a boolean Series:

In [98]:
merged['_merge'] =='both'

Out[98]:
0     True
1    False
2    False
3     True
4    False
Name: _merge, dtype: bool

the merged df looks like this:

In [99]:
merged

Out[99]:
         A          B      C     _merge
0    alice       andy   some       both
1      bob    bridget  other  left_only
2  charlie  charlotte  stuff  left_only
3     dave      diana   here       both
4     dave       andy      !  left_only
like image 98
EdChum Avatar answered Oct 13 '22 22:10

EdChum