When I visualized my data, it had a series of periodicities in row. But Randomforest imputing and PCA imputing create outlier.
Main problem :
So i want to average on both sides in row. When the NA value of the nth column occurs, the average of n-1 column and n+1 column impute corresponding row.
Sub problem
But first column and last column doesn't have n-1 or n+1, therefore i will take n+1 or n-1.(don't worry, deviation of row is very small.)
It occur na value in row continuosly. It also take n-1 or n+1.
EX:
tr <- structure(list(A_1 = c(NA,2,3,4,5), A_2 =c(4,5,6,NA,8), A_3 = c(7,9,NA,10,11),
A_4 = c(10,12,NA,13,NA), A_5 =c(12,NA,14,15,16), A_6 = c(13,15,15,16,17)),
row.names = c(NA, -5L),class = "data.frame")
> tr
A_1 A_2 A_3 A_4 A_5 A_6
1 NA 4 7 10 12 13
2 2 5 9 12 NA 15
3 3 6 NA NA 14 15
4 4 NA 10 13 15 16
5 5 8 11 NA 16 17
Desired output
> tr
A_1 A_2 A_3 A_4 A_5 A_6
1 4 4 7 10 12 13
2 2 5 9 12 13.5 15
3 3 6 6 14 14 15
4 4 7 10 13 15 16
5 5 8 11 13.5 16 17
One way via dplyr is to convert to long format, take the lag() and lead() of your value column, compute the row means, replace NA and convert back to wide. i.e.
library(dplyr)
library(tidyr)
tr %>%
pivot_longer(everything()) %>%
mutate(n1 = lag(value), n2 = lead(value)) %>%
mutate(res = rowMeans(select(., c(n1, n2)), na.rm = TRUE),
value = replace(value, is.na(value), res[is.na(value)])) %>%
select(name, value) %>%
pivot_wider(names_from = name, values_from = value) %>%
unnest()
which gives,
A_1 A_2 A_3 A_4 A_5 A_6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 4 7 10 12 13
2 2 5 9 12 13.5 15
3 3 6 6 14 14 15
4 4 7 10 13 15 16
5 5 8 11 13.5 16 17
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With