I have a dataframe:
High Low Close Date 2009-02-11 30.20 29.41 29.87 2009-02-12 30.28 29.32 30.24 2009-02-13 30.45 29.96 30.10 2009-02-17 29.35 28.74 28.90 2009-02-18 29.35 28.56 28.92
and a boolean series:
bools 1 True 2 False 3 False 4 True 5 False
how could I select from the dataframe using the boolean array to obtain result like:
High Date 2009-02-11 30.20 2009-02-17 29.35
The bool() method returns a boolean value, True or False, reflecting the value of the DataFrame. This method will only work if the DataFrame has only 1 value, and that value must be either True or False, otherwise the bool() method will return an error.
Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.
This is a pandas Extension array for boolean data, under the hood represented by 2 numpy arrays: a boolean array with the data and a boolean array with the mask (True indicating missing). BooleanArray implements Kleene logic (sometimes called three-value logic) for logical operations.
Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.
For the indexing to work with two DataFrames they have to have comparable indexes. In this case it won't work because one DataFrame
has an integer index, while the other has dates.
However, as you say you can filter using a bool
array. You can access the array for a Series
via .values
. This can be then applied as a filter as follows:
df # pandas.DataFrame s # pandas.Series df[s.values] # df, filtered by the bool array in s
For example, with your data:
import pandas as pd df = pd.DataFrame([ [30.20, 29.41, 29.87], [30.28, 29.32, 30.24], [30.45, 29.96, 30.10], [29.35, 28.74, 28.90], [29.35, 28.56, 28.92], ], columns=['High','Low','Close'], index=['2009-02-11','2009-02-12','2009-02-13','2009-02-17','2009-02-18'] ) s = pd.Series([True, False, False, True, False], name='bools') df[s.values]
Returns the following:
High Low Close 2009-02-11 30.20 29.41 29.87 2009-02-17 29.35 28.74 28.90
If you just want the High column, you can filter this as normal (before, or after the bool
filter):
df['High'][s.values] # Or: df[s.values]['High']
To get your target output (as a Series
):
2009-02-11 30.20 2009-02-17 29.35 Name: High, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With