Somebody helped me for a code. I understood everything in the code except the very last row .transform('first') I see what it does (I can see it), but I'd like to precisely know what it's doing behind to obtain this result.
This is the part of the code I understand :
df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
.groupby(df['Date'].dt.year)
.Value
.cumsum()
.sub(df['Value'])
.add(df['YTD'])
)
This is the output of this first part :
Value Type Date YTD YEP
0 100 Budget 2019-01-01 101.0 974.0
1 50 Budget 2019-02-01 199.0 1022.0
2 20 Budget 2019-03-01 275.0 1078.0
3 123 Budget 2019-04-01 332.0 1012.0
4 56 Budget 2019-05-01 NaN NaN
5 76 Budget 2019-06-01 NaN NaN
6 98 Budget 2019-07-01 NaN NaN
7 126 Budget 2019-08-01 NaN NaN
8 90 Budget 2019-09-01 NaN NaN
9 80 Budget 2019-10-01 NaN NaN
10 67 Budget 2019-11-01 NaN NaN
11 87 Budget 2019-12-01 NaN NaN
12 101 Actual 2019-01-01 101.0 NaN
13 98 Actual 2019-02-01 199.0 NaN
14 76 Actual 2019-03-01 275.0 NaN
15 57 Actual 2019-04-01 332.0 NaN
This is the entire code :
df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
.groupby(df['Date'].dt.year)
.Value
.cumsum()
.sub(df['Value'])
.add(df['YTD'])
.groupby(df['Date'])
.transform('first') )
I got this after running the entire code :
Value Type Date YTD YEP
0 100 Budget 2019-01-01 101.0 974.0
1 50 Budget 2019-02-01 199.0 1022.0
2 20 Budget 2019-03-01 275.0 1078.0
3 123 Budget 2019-04-01 332.0 1012.0
4 56 Budget 2019-05-01 NaN NaN
5 76 Budget 2019-06-01 NaN NaN
6 98 Budget 2019-07-01 NaN NaN
7 126 Budget 2019-08-01 NaN NaN
8 90 Budget 2019-09-01 NaN NaN
9 80 Budget 2019-10-01 NaN NaN
10 67 Budget 2019-11-01 NaN NaN
11 87 Budget 2019-12-01 NaN NaN
12 101 Actual 2019-01-01 101.0 974.0
13 98 Actual 2019-02-01 199.0 1022.0
14 76 Actual 2019-03-01 275.0 1078.0
15 57 Actual 2019-04-01 332.0 1012.0
I know that "transform" is like "apply". But I don't get what it means to apply - or transform - with this parameter first. What does first do here combined with transform?
What does it mean 'first'?
The parameter in the .transform() method may be a NumPy function, a string function name or a user-defined function. It means that in the line
.transform('first')
it's a string function name. So it represents the function first().
Where is the function first() coming from?
It's a GroupBy's method .first().
What does the function first() return?
It returns the first non-NaN value in a series, or NaN if there is none.
What does the method .transform() do?
It applies its parameter-function to every column (i.e. the series) of dataframe to obtain a new (transformed) column. Then it returns a dataframe consisting of such (transformed) columns.
In the case of series it returns — of course — a transformed series.
It means that function-parameter of .transform method must return a series with the same size?
No, it is only one possibility.
The other is a scalar — it will be broadcasted (repeated) to make a series with the same size.
The used function (the GroupBy's method first()) is a good example of such a function.
So what does the method .transform('first') return?
It returns a series / dataframe with the same shape as the source group chunk, in which all values in every individual column are replaced with the first non-NaN value in this column, or with NaN if there is none.
The lines
.groupby(df['Date'])
.transform('first')
first split your (intermediate) series into groups of individual dates and then — just before recombination — apply the first() function to every series in every group.
It effectively replaces every value in every group with the first non-NaN value in its series if such a value exists.
This means that in the resulting series (your new column) will be all values of (intermediate) series replaced with the first non-NaN value in the same day (if such a value in the same day exists).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With