Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create bool mask from filter results in Pandas [duplicate]

Tags:

python

pandas

I know how to create a mask to filter a dataframe when querying a single column:

import pandas as pd
import datetime
index = pd.date_range('2013-1-1',periods=100,freq='30Min')
data = pd.DataFrame(data=list(range(100)), columns=['value'], index=index)
data['value2'] = 'A'
data['value2'].loc[0:10] = 'B'

data

    value   value2
2013-01-01 00:00:00 0   B
2013-01-01 00:30:00 1   B
2013-01-01 01:00:00 2   B
2013-01-01 01:30:00 3   B
2013-01-01 02:00:00 4   B
2013-01-01 02:30:00 5   B
2013-01-01 03:00:00 6   B

I use a simple mask here:

mask = data['value'] > 4
data[mask]
    value   value2
2013-01-01 02:30:00 5   B
2013-01-01 03:00:00 6   B
2013-01-01 03:30:00 7   B
2013-01-01 04:00:00 8   B
2013-01-01 04:30:00 9   B
2013-01-01 05:00:00 10  A

My question is how to create a mask with multiple columns? So if I do this:

data[data['value2'] == 'A' ][data['value'] > 4]

This filters as I would expect but how do I create a bool mask from this as per my other example? I have provided the test data for this but I often want to create a mask on other types of data so Im looking for any pointers please.

like image 777
nipy Avatar asked Aug 06 '16 09:08

nipy


People also ask

How do I get duplicates in pandas?

Pandas DataFrame duplicated() Method The duplicated() method returns a Series with True and False values that describe which rows in the DataFrame are duplicated and not. Use the subset parameter to specify if any columns should not be considered when looking for duplicates.

How do you filter duplicate rows in Python?

Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

How do you check if there are duplicate rows in pandas DataFrame?

The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

How do I know if a DataFrame contains duplicates?

To take a look at the duplication in the DataFrame as a whole, just call the duplicated() method on the DataFrame. It outputs True if an entire row is identical to a previous row.


1 Answers

Your boolean masks are boolean (obviously) so you can use boolean operations on them. The boolean operators include (but are not limited to) &, | which can combine your masks based on either an 'and' operation or an 'or' operation. In your specific case, you need an 'and' operation. So you simply write your mask like so:

mask = (data['value2'] == 'A') & (data['value'] > 4) 

This ensures you are selecting those rows for which both conditions are simultaneously satisfied. By replacing the & with |, one can select those rows for which either of the two conditions can be satisfied. You can select your result as usual:

data[mask] 

Although this question is answered by the answer to the question that ayhan points out in his comment, I thought that the OP was lacking the idea of boolean operations.

like image 131
Kartik Avatar answered Sep 19 '22 12:09

Kartik