Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace NA seperately with linear model in R

Tags:

dataframe

r

dplyr

I've looked up some web pages (but their results don't meet my needs):

  • NA replacing with blanks

  • Replacing "NA" (NA string) with NA inplace data.table

  • replace <NA> with NA.

I want to write a function that could do this:

Say there is a vector a.

a = c(100000, 137862, NA, NA, NA, 178337, NA, NA, NA, NA, NA, 295530)

First, find the value before and after the single and consecutive NA. In this situation is 137862, NA, NA, NA, 178337 and 178337, NA, NA, NA, NA, NA, 295530.

Second, calculate the slope in every part then replace the NA.

# 137862, NA, NA, NA, 178337
slope_1 = (178337 - 137862)/4

137862 + slope_1*1 # 1st NA replace with 147980.8
137862 + slope_1*2 # 2nd NA replace with 158099.5
137862 + slope_1*3 # 3rd NA replace with 168218.2

# 178337, NA, NA, NA, NA, NA, 295530

slope_2 = (295530 - 178337)/6

178337 + slope_2*1 # 4th NA replace with 197869.2
178337 + slope_2*2 # 5th NA replace with 217401.3
178337 + slope_2*3 # 6th NA replace with 236933.5
178337 + slope_2*4 # 7th NA replace with 256465.7
178337 + slope_2*5 # 8th NA replace with 275997.8

Finally, the expected vector should be this:

a_without_NA = c(100000, 137862, 147980.8, 158099.5, 168218.2, 178337, 197869.2, 217401.3, 
                 236933.5, 256465.7, 275997.8, 295530)

If single or consecutive NA is in the begining, then it would be keep.

# NA at begining
b = c(NA, NA, 1, 3, NA, 5, 7)

# 3, NA, 5
slope_1 = (5-3)/2
3 + slope_1*1 # 3rd NA replace with 4
b_without_NA = c(NA, NA, 1, 3, 4, 5, 7)

# NA at ending
c = c(1, 3, NA, 5, 7, NA, NA)

# 3, NA, 5
slope_1 = (5-3)/2
3 + slope_1*1 # 1st NA replace with 4
c_without_NA = c(1, 3, 4, 5, 7, NA, NA)

Note: in my real situation, every element of the vector is increasing(vector[n + 1] > vector[n]).

I know the principle, but I don't know how to write a self-define function to implement this.

Any help will highly appreciated!!

like image 723
zhiwei li Avatar asked Dec 05 '22 08:12

zhiwei li


1 Answers

zoo's na.approx can help :

a = c(100000, 137862, NA, NA, NA, 178337, NA, NA, NA, NA, NA, 295530)
zoo::na.approx(a, na.rm = FALSE)

# [1] 100000.0 137862.0 147980.8 158099.5 168218.2 178337.0 197869.2 217401.3
# [9] 236933.5 256465.7 275997.8 295530.0

b = c(NA, NA, 1, 3, NA, 5, 7)

zoo::na.approx(b, na.rm = FALSE)
#[1] NA NA  1  3  4  5  7

c = c(1, 3, NA, 5, 7, NA, NA)
zoo::na.approx(c, na.rm = FALSE)
#[1]  1  3  4  5  7 NA NA
like image 180
Ronak Shah Avatar answered Dec 22 '22 00:12

Ronak Shah