I'm having trouble finding the non-for loop version of this. Say this is my input:
In [94]: df
Out[94]:
N experiment color value value2
0 145000 0 red 0.30 0.363953
1 14000 1 red 0.31 0.218978
2 13000 2 red 0.29 0.948070
3 15000 0 yellow 0.31 0.620201
4 1200 1 yellow 0.32 0.567513
5 1400 2 yellow 0.31 0.318197
6 40000 0 green 0.29 0.947226
7 3000 1 green 0.31 0.084243
8 7000 2 green 0.32 0.961020
[9 rows x 5 columns]
Experiment 0 is my control. I performed this experiment for various colors. I want to normalize all rows by the matching color experiment 0.
In [104]: df
Out[104]:
N experiment color value value2 scaled_value scaled_value2
0 145000 0 red 0.30 0.363953 1.000000 1.000000
1 14000 1 red 0.31 0.218978 1.033333 0.590786
2 13000 2 red 0.29 0.948070 0.966667 2.604732
3 15000 0 yellow 0.31 0.620201 1.000000 1.000000
4 1200 1 yellow 0.32 0.567513 1.032258 0.914220
5 1400 2 yellow 0.31 0.318197 1.000000 0.512737
6 40000 0 green 0.29 0.947226 1.000000 1.000000
7 3000 1 green 0.31 0.084243 1.068966 0.088680
8 7000 2 green 0.32 0.961020 1.103448 1.014541
[9 rows x 7 columns]
It seems I would want to exact the experiment zero parameters to their own column to permit easy division, but I can't get the stacking/pivoting/concat'ing to work properly. I did this with for loops using .at and assigning values but it felt so wrong.
The next steps would be normalizing several columns and creating upplower and lower error bounds calculated from (N, value), (N, value2), (N, other_values) etc.
One way would be to use transform
(here using idxmin
, although there are many alternatives) to get the indices of the rows we want to use as the denominator:
>>> ii = df.groupby("color")["experiment"].transform("idxmin")
>>> cols = ["value", "value2"]
>>> new_cols = (df.loc[:,cols] /df.loc[ii, cols].values)
>>> df.join(new_cols.rename(columns=lambda x: "scaled_" + x))
N experiment color value value2 scaled_value scaled_value2
0 145000 0 red 0.30 0.363953 1.000000 1.000000
1 14000 1 red 0.31 0.218978 1.033333 0.601666
2 13000 2 red 0.29 0.948070 0.966667 2.604924
3 15000 0 yellow 0.31 0.620201 1.000000 1.000000
4 1200 1 yellow 0.32 0.567513 1.032258 0.915047
5 1400 2 yellow 0.31 0.318197 1.000000 0.513055
6 40000 0 green 0.29 0.947226 1.000000 1.000000
7 3000 1 green 0.31 0.084243 1.068966 0.088937
8 7000 2 green 0.32 0.961020 1.103448 1.014563
Step by step, first we find the denominator indices:
>>> ii = df.groupby("color")["experiment"].transform("idxmin")
>>> ii
0 0
1 0
2 0
3 3
4 3
5 3
6 6
7 6
8 6
dtype: int64
Then we can use this to index into the frame:
>>> df.loc[ii, cols]
value value2
0 0.30 0.363953
0 0.30 0.363953
0 0.30 0.363953
3 0.31 0.620201
3 0.31 0.620201
3 0.31 0.620201
6 0.29 0.947226
6 0.29 0.947226
6 0.29 0.947226
Since we're handling the alignment ourselves, we need to call .values
to drop down to the underlying array-- otherwise pandas will try to outsmart us and align things correctly based on the indices.
Then we divide:
>>> (df.loc[:,cols] /df.loc[ii, cols].values)
value value2
0 1.000000 1.000000
1 1.033333 0.601666
2 0.966667 2.604924
3 1.000000 1.000000
4 1.032258 0.915047
5 1.000000 0.513055
6 1.000000 1.000000
7 1.068966 0.088937
8 1.103448 1.014563
and finally join them, renaming the new columns:
>>> df.join(new_cols.rename(columns=lambda x: "scaled_" + x))
N experiment color value value2 scaled_value scaled_value2
0 145000 0 red 0.30 0.363953 1.000000 1.000000
1 14000 1 red 0.31 0.218978 1.033333 0.601666
2 13000 2 red 0.29 0.948070 0.966667 2.604924
3 15000 0 yellow 0.31 0.620201 1.000000 1.000000
4 1200 1 yellow 0.32 0.567513 1.032258 0.915047
5 1400 2 yellow 0.31 0.318197 1.000000 0.513055
6 40000 0 green 0.29 0.947226 1.000000 1.000000
7 3000 1 green 0.31 0.084243 1.068966 0.088937
8 7000 2 green 0.32 0.961020 1.103448 1.014563
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With