i am having some trouble masking a panel in the same way that I would a DataFrame. What I want to do feels simple, but I have not found a way looking at the docs and online forums. I have a simple example below:
import pandas
import numpy as np
import datetime
start_date = datetime.datetime(2009,3,1,6,29,59)
r = pandas.date_range(start_date, periods=12)
cols_1 = ['AAPL', 'AAPL', 'GOOG', 'GOOG', 'GS', 'GS']
cols_2 = ['close', 'rate', 'close', 'rate', 'close', 'rate']
dat = np.random.randn(12, 6)
dftst = pandas.DataFrame(dat, columns=pandas.MultiIndex.from_arrays([cols_1, cols_2], names=['ticker','field']), index=r)
pn = dftst.T.to_panel().transpose(2,0,1)
print pn
Out[14]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 12 (major_axis) x 3 (minor_axis)
Items axis: close to rate
Major_axis axis: 2009-03-01 06:29:59 to 2009-03-12 06:29:59
Minor_axis axis: AAPL to GS
I now have a Panel object, if I take a slice along the items axis, I get a DataFrame
close_p = pn['close']
print close_p
Out[16]:
ticker AAPL GOOG GS
2009-03-01 06:29:59 -0.082203 -0.286354 1.227193
2009-03-02 06:29:59 0.340005 -0.688933 -1.505137
2009-03-03 06:29:59 -0.525567 0.321858 -0.035047
2009-03-04 06:29:59 -0.123549 -0.841781 -0.616523
2009-03-05 06:29:59 -0.407504 0.188372 1.311262
2009-03-06 06:29:59 0.272883 0.817179 0.584664
2009-03-07 06:29:59 -1.767227 1.168876 0.443096
2009-03-08 06:29:59 -0.685501 -0.534373 -0.063906
2009-03-09 06:29:59 0.851820 0.068740 0.566537
2009-03-10 06:29:59 0.390678 -0.012422 -0.152375
2009-03-11 06:29:59 -0.985585 -0.917705 -0.585091
2009-03-12 06:29:59 0.067498 -0.764343 0.497270
I can filter this data in two ways:
1) I create a mask and mask the data as follows:
msk = close_p > 0
close_p = close_p.mask(msk)
2) I can just slice by the boolean operator in msk above
close_p = close_p[close_p > 0]
Out[28]:
ticker AAPL GOOG GS
2009-03-01 06:29:59 NaN NaN 1.227193
2009-03-02 06:29:59 0.340005 NaN NaN
2009-03-03 06:29:59 NaN 0.321858 NaN
2009-03-04 06:29:59 NaN NaN NaN
2009-03-05 06:29:59 NaN 0.188372 1.311262
2009-03-06 06:29:59 0.272883 0.817179 0.584664
2009-03-07 06:29:59 NaN 1.168876 0.443096
2009-03-08 06:29:59 NaN NaN NaN
2009-03-09 06:29:59 0.851820 0.068740 0.566537
2009-03-10 06:29:59 0.390678 NaN NaN
2009-03-11 06:29:59 NaN NaN NaN
2009-03-12 06:29:59 0.067498 NaN 0.497270
What I cannot figure out how to do is filter all of my data based on a mask without a for loop. I can do the following:
msk = (pn['rate'] > 0) & (pn['close'] > 0)
def mask_panel(pan, msk):
for item in pan.items:
pan[item] = pan[item].mask(msk)
return pan
print pn['close']
Out[32]:
ticker AAPL GOOG GS
2009-03-01 06:29:59 -0.082203 -0.286354 1.227193
2009-03-02 06:29:59 0.340005 -0.688933 -1.505137
2009-03-03 06:29:59 -0.525567 0.321858 -0.035047
2009-03-04 06:29:59 -0.123549 -0.841781 -0.616523
2009-03-05 06:29:59 -0.407504 0.188372 1.311262
2009-03-06 06:29:59 0.272883 0.817179 0.584664
2009-03-07 06:29:59 -1.767227 1.168876 0.443096
2009-03-08 06:29:59 -0.685501 -0.534373 -0.063906
2009-03-09 06:29:59 0.851820 0.068740 0.566537
2009-03-10 06:29:59 0.390678 -0.012422 -0.152375
2009-03-11 06:29:59 -0.985585 -0.917705 -0.585091
2009-03-12 06:29:59 0.067498 -0.764343 0.497270
mask_panel(pn, msk)
print pn['close']
Out[34]:
ticker AAPL GOOG GS
2009-03-01 06:29:59 -0.082203 -0.286354 NaN
2009-03-02 06:29:59 NaN -0.688933 -1.505137
2009-03-03 06:29:59 -0.525567 NaN -0.035047
2009-03-04 06:29:59 -0.123549 -0.841781 -0.616523
2009-03-05 06:29:59 -0.407504 NaN NaN
2009-03-06 06:29:59 NaN NaN NaN
2009-03-07 06:29:59 -1.767227 NaN NaN
2009-03-08 06:29:59 -0.685501 -0.534373 -0.063906
2009-03-09 06:29:59 NaN NaN NaN
2009-03-10 06:29:59 NaN -0.012422 -0.152375
2009-03-11 06:29:59 -0.985585 -0.917705 -0.585091
2009-03-12 06:29:59 NaN -0.764343 NaN
So the above loop does the trick. I know there is a faster vectorized way of doing this using the ndarray, but I have not put that together yet. It also seems like this should be functionality that is built into the pandas library. If there is a way to do this that I am missing, any suggestions would be much appreciated.
Pandas provides a feature called Boolean Masks that let's you filter DataFrames based on conditions. With this, we can write simple queries to filter our data. In this article, we will learn how to use Boolean Masks to filter rows in our DataFrame.
Pandas DataFrame mask() MethodThe mask() method replaces the values of the rows where the condition evaluates to True. The mask() method is the opposite of the The where() method.
mask() function return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other object. The other object could be a scalar, series, dataframe or could be a callable. The mask method is an application of the if-then idiom.
In Pandas, Panel is a very important container for three-dimensional data. The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data and, in particular, econometric analysis of panel data. In Pandas Panel. shape can be used to get a tuple of axis dimensions.
I think this will work (and what Panel.where should do, but its a bit non-trivial because it has to handle a bunch of cases)
# construct the mask in 2-d (a frame)
In [36]: mask = (pn['close']>0) & (pn['rate']>0)
In [37]: mask
Out[37]:
ticker AAPL GOOG GS
2009-03-01 06:29:59 False False False
2009-03-02 06:29:59 False False True
....
# here's the key, this broadcasts, setting the values which
# don't meet the condition to nan
In [38]: masked_values = np.where(mask,pn.values,np.nan)
# reconstruct the panel (the _construct_axes_dict is an internal function that returns
# dict of the axes, e.g. items -> the items, major_axis -> .....
In [42]: x = pd.Panel(masked_values,**pn._construct_axes_dict())
Out[42]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 12 (major_axis) x 3 (minor_axis)
Items axis: close to rate
Major_axis axis: 2009-03-01 06:29:59 to 2009-03-12 06:29:59
Minor_axis axis: AAPL to GS
# the values
In [43]: x
Out[43]:
array([[[ nan, nan, nan],
[ nan, nan, 0.09575723],
[ nan, nan, nan],
[ nan, nan, nan],
[ nan, 2.07229823, 0.04347515],
[ nan, nan, nan],
[ nan, nan, 2.18342239],
[ nan, nan, 1.73674381],
[ nan, 2.01173087, nan],
[ 0.24109645, 0.94583072, nan],
[ 0.36953467, nan, 0.18044432],
[ 1.74164222, 1.02314752, 1.73736033]],
[[ nan, nan, nan],
[ nan, nan, 0.06960387],
[ nan, nan, nan],
[ nan, nan, nan],
[ nan, 0.63202199, 0.56724391],
[ nan, nan, nan],
[ nan, nan, 0.71964824],
[ nan, nan, 1.03482927],
[ nan, 0.18256148, nan],
[ 1.29451667, 0.49804327, nan],
[ 2.04726538, nan, 0.12883128],
[ 0.70647885, 0.7277734 , 0.77844475]]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With