I know how to create a mask to filter a dataframe when querying a single column:
import pandas as pd
import datetime
index = pd.date_range('2013-1-1',periods=100,freq='30Min')
data = pd.DataFrame(data=list(range(100)), columns=['value'], index=index)
data['value2'] = 'A'
data['value2'].loc[0:10] = 'B'
data
value value2
2013-01-01 00:00:00 0 B
2013-01-01 00:30:00 1 B
2013-01-01 01:00:00 2 B
2013-01-01 01:30:00 3 B
2013-01-01 02:00:00 4 B
2013-01-01 02:30:00 5 B
2013-01-01 03:00:00 6 B
I use a simple mask here:
mask = data['value'] > 4
data[mask]
value value2
2013-01-01 02:30:00 5 B
2013-01-01 03:00:00 6 B
2013-01-01 03:30:00 7 B
2013-01-01 04:00:00 8 B
2013-01-01 04:30:00 9 B
2013-01-01 05:00:00 10 A
My question is how to create a mask with multiple columns? So if I do this:
data[data['value2'] == 'A' ][data['value'] > 4]
This filters as I would expect but how do I create a bool mask from this as per my other example? I have provided the test data for this but I often want to create a mask on other types of data so Im looking for any pointers please.
Pandas DataFrame duplicated() Method The duplicated() method returns a Series with True and False values that describe which rows in the DataFrame are duplicated and not. Use the subset parameter to specify if any columns should not be considered when looking for duplicates.
Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
To take a look at the duplication in the DataFrame as a whole, just call the duplicated() method on the DataFrame. It outputs True if an entire row is identical to a previous row.
Your boolean masks are boolean (obviously) so you can use boolean operations on them. The boolean operators include (but are not limited to) &
, |
which can combine your masks based on either an 'and' operation or an 'or' operation. In your specific case, you need an 'and' operation. So you simply write your mask like so:
mask = (data['value2'] == 'A') & (data['value'] > 4)
This ensures you are selecting those rows for which both conditions are simultaneously satisfied. By replacing the &
with |
, one can select those rows for which either of the two conditions can be satisfied. You can select your result as usual:
data[mask]
Although this question is answered by the answer to the question that ayhan points out in his comment, I thought that the OP was lacking the idea of boolean operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With