How to refer to current version of pandas dataframe in chained operations

Question

Let's say I have the following dataset:

import pandas as pd
import numpy as np

df = pd.read_csv("https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv")

df["tip_fcst"] = np.random.uniform(low=0, high=0.40, size=len(df))
df["tip_fcst"] = df.tip_fcst * df.total_bill

df.head(5)
   total_bill   tip     sex smoker  day    time  size  tip_fcst
0       16.99  1.01  Female     No  Sun  Dinner     2  1.123689
1       10.34  1.66    Male     No  Sun  Dinner     3  3.125474
2       21.01  3.50    Male     No  Sun  Dinner     3  2.439321
3       23.68  3.31    Male     No  Sun  Dinner     2  3.099715
4       24.59  3.61  Female     No  Sun  Dinner     4  1.785596

And I am performing the following operations

time_table = (
df
.groupby("time")
.agg({"tip": lambda x:
          df.ix[x.index].tip.sum() / df.ix[x.index].total_bill.sum(),
      "tip_fcst": lambda x:
          df.ix[x.index].tip_fcst.sum() / df.ix[x.index].total_bill.sum()
    })
)

What I would like to do is add another step using assign to create a new variable called difference. The problem I'm having is that I don't know how to reference the "current version" of the dataframe to use the newly created variables. I realize that I could just save what I have so far to time_table and then use time_table["difference"] = time_table.tip_fcst - time_table.tip, but I like this flow of chained operations and was hoping there was a way to do it within there. Is this possible?

Nickil Maveli · Accepted Answer

You could chain these altogether if you assign the selected DF with a lambda function:

(df.groupby("time").agg({"tip": lambda x: df.ix[x.index].tip.sum() / df.ix[x.index].total_bill.sum(),
                         "tip_fcst": lambda x: df.ix[x.index].tip_fcst.sum() / df.ix[x.index].total_bill.sum()})
).assign(difference=lambda x: x.tip_fcst - x.tip)

enter image description here

How to refer to current version of pandas dataframe in chained operations

Tags:

tblznbits

1 Answers

Nickil Maveli

Recent Activity

Donate For Us

How to refer to current version of pandas dataframe in chained operations

Tags:

tblznbits

1 Answers

Nickil Maveli

Related questions

Recent Activity

Donate For Us