Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing arithmetic with a multi-index pandas dataframe that needs broadcasting at several levels

Tags:

python

pandas

I have a dataframe that looks like this:

   one                     two                    three                   
     1         2             1             2          1             2     
     X    Y    X    Y        X    Y        X    Y     X        Y    X    Y
a  0.3 -0.6 -0.3 -0.2  1.5e+00  0.3 -1.0e+00  1.2   0.6 -9.8e-02 -0.4  0.4
b -0.6 -0.4 -1.1  2.3 -7.4e-02  0.7 -7.4e-02 -0.5  -0.3 -6.8e-01  1.1 -0.1

How do I divide all elements of df by df["three"] ?

I tried df.div(df["three"],level=[1,2]) with no luck.

like image 256
avs Avatar asked Sep 27 '22 02:09

avs


2 Answers

Here's a one liner.

df / pd.concat( [ df.three ] * 3, axis=1 ).values

And here's another way that is a little less concise but may be more readable.

df2 = df.copy()
for c in df.columns.levels[0]:
    df2[c] = df[c] / df['three']

And finally, here is a longer solution with more of an explanation. I did it this way originally before realizing there were better ways. But I'll keep it here as it is more informative about what is happening behind the scenes on an operation like this (though possibly overkill).

First off, multi-index doesn't copy well, so I'll create a sample dataframe that is pretty similar.

np.random.seed(123)
tuples = list(zip(*[['one', 'one', 'two', 'two', 'three', 'three'],
                    ['foo', 'bar', 'foo', 'bar', 'foo', 'bar']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index)

first        one                 two               three          
second       foo       bar       foo       bar       foo       bar
A      -1.085631  0.997345  0.282978 -1.506295 -0.578600  1.651437
B      -2.426679 -0.428913  1.265936 -0.866740 -0.678886 -0.094709
C       1.491390 -0.638902 -0.443982 -0.434351  2.205930  2.186786

The simplest approach is likely to expand the denominator by 3 so that it will match the dimension of the full dataframe. Alternatively you could loop over the columns but then you have to re-combine them afterwards which may not be as easy as you'd think in the case of a multi-index. So broadcast column 'three' like this.

denom = pd.concat( [df['three']]*3, axis=1 )
denom = pd.DataFrame( denom.values, columns=df.columns, index=df.index )

first        one                 two               three          
second       foo       bar       foo       bar       foo       bar
A      -0.578600  1.651437 -0.578600  1.651437 -0.578600  1.651437
B      -0.678886 -0.094709 -0.678886 -0.094709 -0.678886 -0.094709
C       2.205930  2.186786  2.205930  2.186786  2.205930  2.186786

The first 'denom' line just expands the 'three' column to be the same shape as the existing dataframe. The second 'denom' is necessary to match the row and column indices. Now you can just write a normal divide operation.

df / denom

first        one                 two           three    
second       foo       bar       foo       bar   foo bar
A       1.876305  0.603926 -0.489074 -0.912112     1   1
B       3.574501  4.528744 -1.864725  9.151619     1   1
C       0.676082 -0.292165 -0.201267 -0.198625     1   1

A quick note on the one liner relative to this longer solution. The values in the one liner converts from a dataframe to an array, which has the convenient side effect of erasing the row and column indices. Alternatively in this longer solution I explicitly conform the indices. Depending on your situation, either approach could be a better way to go.

like image 64
JohnE Avatar answered Oct 13 '22 00:10

JohnE


Another approach is to use unstack() to obtain a pd.Series, divide by that, and then restore the structure using stack().

First I'll create an array with the same structure as your example:

elems = itertools.count()
df = pd.DataFrame(collections.OrderedDict(((a, b, c), {'a': next(elems), 'b': next(elems)}) for a in ['one', 'two', 'three'] for b in ['1', '2'] for c in ['X', 'Y']))

This gives:

  one          two             three
    1     2      1       2         1       2
    X  Y  X  Y   X   Y   X   Y     X   Y   X   Y
a   0  2  4  6   8  10  12  14    16  18  20  22
b   1  3  5  7   9  11  13  15    17  19  21  23

Then to do the division:

df_stacked = df.stack(level=[1, 2])
result_stacked = df_stacked.div(df_stacked['three'], axis=0)
result = result_stacked.unstack(level=[-2, -1])

It's necessary to reindex the result, since it seems like unstack/stack can change the ordering.

result = result.reindex_like(df)

This gives:

        one                                     two                               three
          1                   2                   1                   2               1         2
          X         Y         X         Y         X         Y         X         Y     X    Y    X    Y
a  0.000000  0.111111  0.200000  0.272727  0.500000  0.555556  0.600000  0.636364   1.0  1.0  1.0  1.0
b  0.058824  0.157895  0.238095  0.304348  0.529412  0.578947  0.619048  0.652174   1.0  1.0  1.0  1.0
like image 43
Jack Valmadre Avatar answered Oct 13 '22 00:10

Jack Valmadre