I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With