Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Column Multiindex Subtracting Columns from each other

pandas DataFrame:

Constructor:

c = pd.MultiIndex.from_product([['AAPL','AMZN'],['price','custom']])
i = pd.date_range(start='2017-01-01',end='2017-01-6')
df1 = pd.DataFrame(index=i,columns=c)

df1.loc[:,('AAPL','price')] = list(range(51,57))
df1.loc[:,('AMZN','price')] = list(range(101,107))
df1.loc[:,('AAPL','custom')] = list(range(1,7))
df1.loc[:,('AMZN','custom')] = list(range(17,23))
df1.index.set_names('Dates',inplace=True)
df1.sort_index(axis=1,level=0,inplace=True) # needed for pd.IndexSlice[]

df1

Produces: (can't figure out how to format the output from Jupyter Notebook)

    AAPL    AMZN
    custom  price   custom  price
Dates               
2017-01-01  1   51  17  101
2017-01-02  2   52  18  102
2017-01-03  3   53  19  103
2017-01-04  4   54  20  104
2017-01-05  5   55  21  105
2017-01-06  6   56  22  106

Question: How can I create a 3rd column at the 2nd level of the MultiIndex that is the difference between price and custom? This should be calculated separately for each top column level, i.e. separately for AAPL and AMZN.

Attempted Solutions:

I tried using pd.IndexSlice in 2 ways, both give me all NaNs:

df1.loc[:,pd.IndexSlice[:,'price']].sub(df1.loc[:,pd.IndexSlice[:,'custom']])
df1.loc[:,pd.IndexSlice[:,'price']] - df1.loc[:,pd.IndexSlice[:,'custom']]

Returns:

    AAPL    AMZN
    custom  price   custom  price
Dates               
2017-01-01  NaN NaN NaN NaN
2017-01-02  NaN NaN NaN NaN
2017-01-03  NaN NaN NaN NaN
2017-01-04  NaN NaN NaN NaN
2017-01-05  NaN NaN NaN NaN
2017-01-06  NaN NaN NaN NaN

How can I add a third column with the difference?

Thanks.

like image 619
Josh D Avatar asked Aug 18 '17 18:08

Josh D


People also ask

How do I subtract multiple columns in pandas?

We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.

How do I subtract one panda's DataFrame from another?

The sub() method of pandas DataFrame subtracts the elements of one DataFrame from the elements of another DataFrame. Invoking sub() method on a DataFrame object is equivalent to calling the binary subtraction operator(-). The sub() method supports passing a parameter for missing values(np. nan, None).

How do you subtract values in pandas?

Pandas DataFrame sub() MethodThe sub() method subtracts each value in the DataFrame with a specified value. The specified value must be an object that can be subtracted from the values in the DataFrame.


1 Answers

You might consider subtraction of the values:

df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values

To join it back, you can use pd.concat:

In [221]: df2 = (df1.loc[:, pd.IndexSlice[:, 'price']] - df1.loc[:,pd.IndexSlice[:,'custom']].values)\
                            .rename(columns={'price' : 'new'})

In [222]: pd.concat([df1, df2], axis=1)
Out[222]: 
             AAPL         AMZN       AAPL AMZN
           custom price custom price  new  new
Dates                                         
2017-01-01      1    51     17   101   50   84
2017-01-02      2    52     18   102   50   84
2017-01-03      3    53     19   103   50   84
2017-01-04      4    54     20   104   50   84
2017-01-05      5    55     21   105   50   84
2017-01-06      6    56     22   106   50   84
like image 122
cs95 Avatar answered Nov 15 '22 05:11

cs95