pandas DataFrame:
Constructor:
c = pd.MultiIndex.from_product([['AAPL','AMZN'],['price','custom']])
i = pd.date_range(start='2017-01-01',end='2017-01-6')
df1 = pd.DataFrame(index=i,columns=c)
df1.loc[:,('AAPL','price')] = list(range(51,57))
df1.loc[:,('AMZN','price')] = list(range(101,107))
df1.loc[:,('AAPL','custom')] = list(range(1,7))
df1.loc[:,('AMZN','custom')] = list(range(17,23))
df1.index.set_names('Dates',inplace=True)
df1.sort_index(axis=1,level=0,inplace=True) # needed for pd.IndexSlice[]
df1
Produces: (can't figure out how to format the output from Jupyter Notebook)
AAPL AMZN
custom price custom price
Dates
2017-01-01 1 51 17 101
2017-01-02 2 52 18 102
2017-01-03 3 53 19 103
2017-01-04 4 54 20 104
2017-01-05 5 55 21 105
2017-01-06 6 56 22 106
Question:
How can I create a 3rd column at the 2nd level of the MultiIndex that is the difference between price
and custom
? This should be calculated separately for each top column level, i.e. separately for AAPL and AMZN.
Attempted Solutions:
I tried using pd.IndexSlice
in 2 ways, both give me all NaNs
:
df1.loc[:,pd.IndexSlice[:,'price']].sub(df1.loc[:,pd.IndexSlice[:,'custom']])
df1.loc[:,pd.IndexSlice[:,'price']] - df1.loc[:,pd.IndexSlice[:,'custom']]
Returns:
AAPL AMZN
custom price custom price
Dates
2017-01-01 NaN NaN NaN NaN
2017-01-02 NaN NaN NaN NaN
2017-01-03 NaN NaN NaN NaN
2017-01-04 NaN NaN NaN NaN
2017-01-05 NaN NaN NaN NaN
2017-01-06 NaN NaN NaN NaN
How can I add a third column with the difference?
Thanks.
We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.
The sub() method of pandas DataFrame subtracts the elements of one DataFrame from the elements of another DataFrame. Invoking sub() method on a DataFrame object is equivalent to calling the binary subtraction operator(-). The sub() method supports passing a parameter for missing values(np. nan, None).
Pandas DataFrame sub() MethodThe sub() method subtracts each value in the DataFrame with a specified value. The specified value must be an object that can be subtracted from the values in the DataFrame.
You might consider subtraction of the values:
df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values
To join it back, you can use pd.concat
:
In [221]: df2 = (df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values)\
.rename(columns={'price' : 'new'})
In [222]: pd.concat([df1, df2], axis=1)
Out[222]:
AAPL AMZN AAPL AMZN
custom price custom price new new
Dates
2017-01-01 1 51 17 101 50 84
2017-01-02 2 52 18 102 50 84
2017-01-03 3 53 19 103 50 84
2017-01-04 4 54 20 104 50 84
2017-01-05 5 55 21 105 50 84
2017-01-06 6 56 22 106 50 84
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With