Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas data frame - lambda calculus and minimum value per series

I have a csv that contains 3 columns, count_id, AMV and time.

I am using pandas and have read this in as a data frame.

results= pd.read_csv('./output.csv')

First, I am sorting the data frame first for count_id and then for AMV.

results_sorted = results.sort_index(by=['count_id','AMV'], ascending=[True, True])

This gives

   count_id   AMV  Hour
0    16012E  4004    14
1    16012E  4026    12
2    16012E  4099    15
3    16012E  4167    11
4    16012E  4239    10
5    16012E  4324    13
6    16012E  4941    16
7    16012E  5088    17
8    16012E  5283     9
9    16012E  5620     8
10   16012E  5946    18
11   16012E  6146     7
12   16012W  3622    10
13   16012W  3904    12
14   16012W  3979    11
15   16012W  4076     9
16   16012W  4189    13
17   16012W  4870    14
18   16012W  4899    18
19   16012W  5107    15
20   16012W  5659     8
21   16012W  6325     7
22   16012W  6460    17
23   16012W  6500    16

I now want to perform some normalisation on the data so that I can ultimately plot it on the same plot. What I wish to do is find the minimum value for AMV per series (count_id) and then subtract this minimum value from the given AMV. This will give me a new column AMV_norm.

Which would look like:

   count_id   AMV  Hour  AMV_norm
0    16012E  4004    14         0
1    16012E  4026    12        22
2    16012E  4099    15        95
3    16012E  4167    11       163
4    16012E  4239    10       235
5    16012E  4324    13       320
6    16012E  4941    16       937
7    16012E  5088    17      1084
8    16012E  5283     9      1279
9    16012E  5620     8      1616
10   16012E  5946    18      1942
11   16012E  6146     7      2142
12   16012W  3622    10         0
13   16012W  3904    12       282
14   16012W  3979    11       357
15   16012W  4076     9       454
16   16012W  4189    13       567
17   16012W  4870    14      1248
18   16012W  4899    18      1277
19   16012W  5107    15      1485
20   16012W  5659     8      2037
21   16012W  6325     7      2703
22   16012W  6460    17      2838
23   16012W  6500    16      2878

How do I define the function that finds the minimum AMV value per series and not the minimum value of AMV overall? It would look something like this:

def minimum_series_value(AMV):
    return AMV.argmin()

I would then need to create a new column and using a lambda function populate that row. I know it would look something like this:

results_sorted['AMV_norm'] = results_sorted.apply(lambda row:results_sorted(row['AMV']))
like image 201
LearningSlowly Avatar asked Sep 27 '22 10:09

LearningSlowly


1 Answers

Subtract the AMV column from the transform min:

In [11]: df.groupby('count_id')["AMV"].transform('min')
Out[11]:
0     4004
1     4004
2     4004
3     4004
4     4004
...
21    3622
22    3622
23    3622
dtype: int64

In [12]: df["AMV"] - df.groupby('count_id')["AMV"].transform('min')
Out[12]:
0        0
1       22
2       95
3      163
4      235
...
21    2703
22    2838
23    2878
dtype: int64

In [13]: df["AMV_norm"] = df["AMV"] - df.groupby('count_id')["AMV"].transform('min')
like image 52
Andy Hayden Avatar answered Sep 29 '22 08:09

Andy Hayden