I have this kind of data frame
dat = [{"date": datetime.date(2021,1,1), "c_id" : "a", "var1": 2, "var2": 1, "var3" : 10 },
{"date": datetime.date(2021,1,1), "c_id" : "b", "var1": 2, "var2": 0, "var3" : 20 },
{"date": datetime.date(2021,2,1), "c_id" : "a", "var1": 2, "var2": 1, "var3" : 30 },
{"date": datetime.date(2021,2,1), "c_id" : "b", "var1": 2, "var2": 3, "var3" : 10 },
{"date": datetime.date(2021,3,1), "c_id" : "a", "var1": 2, "var2": 1, "var3" : 30 },
{"date": datetime.date(2021,3,1), "c_id" : "b", "var1": 2, "var2": 3, "var3" : 20 },
]
df = pd.DataFrame(dat)
>>> df
date c_id var1 var2 var3
0 2021-01-01 a 2 1 10
1 2021-01-01 b 2 0 20
2 2021-02-01 a 2 1 30
3 2021-02-01 b 2 3 10
4 2021-03-01 a 2 1 30
5 2021-03-01 b 2 3 20
I'd like to have the share of these 3 named variables per (date, c_id). So for example...
>>> df
date c_id var1 var2 var3 var1_share var2_share var3_share
0 2021-01-01 a 2 1 10 0.15 0.07 0.76
1 2021-01-01 b 2 0 20 0.09 0.00 0.90
2 2021-02-01 a 2 1 30 0.06 0.03 0.90
3 2021-02-01 b 2 3 10 0.13 0.20 0.66
4 2021-03-01 a 2 1 30 0.06 0.03 0.90
5 2021-03-01 b 2 3 20 0.08 0.12 0.80
While I can do this in kind of a dumb way if I list these out individually...
>>> df.insert(5, "var1_share", df.apply(lambda x: x["var1"] / x[["var1", "var2", "var3"]].sum(), axis=1))
>>> df
date c_id var1 var2 var3 var1_share
0 2021-01-01 a 2 1 10 0.153846
1 2021-01-01 b 2 0 20 0.090909
2 2021-02-01 a 2 1 30 0.060606
3 2021-02-01 b 2 3 10 0.133333
4 2021-03-01 a 2 1 30 0.060606
5 2021-03-01 b 2 3 20 0.080000
What's the pandas magic for iterating this procedure over some list of valid columns, mylist= ["var1", "var2", "var3"]
? I suspect there is an apply that can do this in a one-liner?
Also, pandas experts, what would this operation be called across columns of a dataframe? I'm sure this is common, but I'm not sure how I could have searched for it better.
Try this:
cols = pd.Index(['var1', 'var2', 'var3'])
df[cols+'_share'] = df[cols].div(df.sum(axis=1), axis=0)
Output:
date c_id var1 var2 var3 var1_share var2_share var3_share
0 2021-01-01 a 2 1 10 0.153846 0.076923 0.769231
1 2021-01-01 b 2 0 20 0.090909 0.000000 0.909091
2 2021-02-01 a 2 1 30 0.060606 0.030303 0.909091
3 2021-02-01 b 2 3 10 0.133333 0.200000 0.666667
4 2021-03-01 a 2 1 30 0.060606 0.030303 0.909091
5 2021-03-01 b 2 3 20 0.080000 0.120000 0.800000
Let's use pandas intrinsic data alignment and pd.DataFrame.div
with parameter axis=0
and pd.DataFrame.sum
with axis=1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With