Pass Multiple Columns to groupby.transform

Question

I understand that when you call a groupby.transform with a DataFrame column, the column is passed to the function that transforms the data. But what I cannot understand is how to pass multiple columns to the function.

people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']

Now I can easily demean that data etc. but what I can't seem to do properly is to transform data inside groups using multiple column values as parameters of the function. For example if I wanted to add a column 'f' that took the value a.mean() - b.mean() * c for each observation how can that be achived using the transform method.

I have tried variants of the following

people['f'] = float(NA)
Grouped = people.groupby(key)
def TransFunc(col1, col2, col3):
    return col1.mean() - col2.mean() * col3
Grouped.f.transform(TransFunc(Grouped['a'], Grouped['b'], Grouped['c']))

But this is clearly wrong. I have also trued to wrap the function in a lamba but can't quite make that work either.

I am able to achieve the result by iterating through the groups in the following manner:

for group in Grouped:
    Amean = np.mean(list(group[1].a))
    Bmean = np.mean(list(group[1].b))
    CList = list(group[1].c)
    IList = list(group[1].index)

    for y in xrange(len(CList)):
        people['f'][IList[y]] = (Amean - Bmean) * CList[y]

But that does not seem a satisfactory solution, particulalry if the index is non-unique. Also I know this must be possible using groupby.transform.

To generalise the question: how does one write functions for transforming data that have parameters that involve using values from multiple columns?

Help appreciated.

HYRY · Accepted Answer

You can use apply() method:

import numpy as np
import pandas as pl
np.random.seed(0)

people2 = pd.DataFrame(np.random.randn(5, 5), 
                      columns=['a', 'b', 'c', 'd', 'e'], 
                      index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']

Grouped = people2.groupby(key)

def f(df):
    df["f"] = (df.a.mean() - df.b.mean())*df.c
    return df

people2 = Grouped.apply(f)
print people2

If you want some generalize method:

Grouped = people2.groupby(key)

def f(a, b, c, **kw):
    return (a.mean() - b.mean())*c

people2["f"] = Grouped.apply(lambda df:f(**df))
print people2

Pass Multiple Columns to groupby.transform

Tags:

python

pandas

Woody Pride

1 Answers

HYRY

Recent Activity

Donate For Us

Pass Multiple Columns to groupby.transform

Tags:

python

pandas

Woody Pride

1 Answers

HYRY

Related questions

Recent Activity

Donate For Us