Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select from pandas dataframe using boolean series/array

I have a dataframe:

             High    Low  Close Date                            2009-02-11  30.20  29.41  29.87 2009-02-12  30.28  29.32  30.24 2009-02-13  30.45  29.96  30.10 2009-02-17  29.35  28.74  28.90 2009-02-18  29.35  28.56  28.92 

and a boolean series:

     bools 1    True 2    False 3    False 4    True 5    False 

how could I select from the dataframe using the boolean array to obtain result like:

             High    Date                            2009-02-11  30.20   2009-02-17  29.35   
like image 382
Osora Avatar asked May 21 '16 12:05

Osora


People also ask

How do you use a boolean in a data frame?

The bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.

Can we perform boolean indexing on a DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

What is boolean array in Pandas?

This is a pandas Extension array for boolean data, under the hood represented by 2 numpy arrays: a boolean array with the data and a boolean array with the mask (True indicating missing). BooleanArray implements Kleene logic (sometimes called three-value logic) for logical operations.

How do I select data from a Pandas DataFrame?

Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.


1 Answers

For the indexing to work with two DataFrames they have to have comparable indexes. In this case it won't work because one DataFrame has an integer index, while the other has dates.

However, as you say you can filter using a bool array. You can access the array for a Series via .values. This can be then applied as a filter as follows:

df # pandas.DataFrame s  # pandas.Series   df[s.values] # df, filtered by the bool array in s 

For example, with your data:

import pandas as pd  df = pd.DataFrame([             [30.20,  29.41,  29.87],             [30.28,  29.32,  30.24],             [30.45,  29.96,  30.10],             [29.35,  28.74,  28.90],             [29.35,  28.56,  28.92],         ],         columns=['High','Low','Close'],          index=['2009-02-11','2009-02-12','2009-02-13','2009-02-17','2009-02-18']         )  s = pd.Series([True, False, False, True, False], name='bools')  df[s.values] 

Returns the following:

            High    Low     Close 2009-02-11  30.20   29.41   29.87 2009-02-17  29.35   28.74   28.90 

If you just want the High column, you can filter this as normal (before, or after the bool filter):

df['High'][s.values] # Or: df[s.values]['High'] 

To get your target output (as a Series):

 2009-02-11    30.20  2009-02-17    29.35  Name: High, dtype: float64 
like image 183
mfitzp Avatar answered Sep 25 '22 01:09

mfitzp