Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the average of the most recent three non-nan value using Python

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7
A    1     1   nan   2    3    0   3
B    nan   nan nan   nan  nan  nan 3
C    1     1   0     1    1    1   1
D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 
A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)
B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing
C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)
D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

I know how to calculate the average of the last three column and count how many non-missing observation are there. df.iloc[:, 5:7].count(axis=1) average of the last three column df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

like image 711
fly36 Avatar asked Dec 26 '18 20:12

fly36


People also ask

How do you fill NaN with average?

For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.

How do you find the average value of a DataFrame in Python?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.


1 Answers

Vectorized one using justify -

N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
like image 121
Divakar Avatar answered Nov 09 '22 23:11

Divakar