I would like to impute missing values for a variable given the existing values.
In var2
, we notice that there are a lot of NA
s.
var2
are the same.var2
, like in the case of id==2, then we just output as NA
.It should look from df_old
to df_new.
df_old<- read.table(header = TRUE, text = "
id var1 var2
1 A 12
1 B NA
1 E NA
2 G NA
2 J NA
")
df_new<- read.table(header = TRUE, text = "
id var1 var2
1 A 12
1 B 12
1 E 12
2 G NA
2 J NA
")
I tried take:
df_new<-df_old %>%
group_by(id) %>%
mutate(var2=na.omit(var2))
I believe it doesn't work because of the second case. I was also wondering if using ifelse would be okay. Need help thanks!
If there is only one var2
value per id
available you could simply do:
df_old %>%
group_by(id) %>%
mutate(var2 = min(var2, na.rm = TRUE))
Source: local data frame [5 x 3]
Groups: id [2]
id var1 var2
<int> <fctr> <int>
1 1 A 12
2 1 B 12
3 1 E 12
4 2 G NA
5 2 J NA
Another option would be:
mutate(var2 = var2[1])
We can use data.table
, but unlike dplyr
, for groups that have all NA, we have to specify NA
to return or else it will give Inf
library(data.table)
setDT(df_old)[, var2 := if(any(!is.na(var2))) min(var2, na.rm = TRUE)
else NA_integer_, by = id]
df_old
# id var1 var2
#1: 1 A 12
#2: 1 B 12
#3: 1 E 12
#4: 2 G NA
#5: 2 J NA
By now there is tidyimpute package available in CRAN which looks like it might do the trick
"Functions and methods for imputing missing values (NA) in tables and list patterned after the tidyverse approach of 'dplyr' and 'rlang'; works with data.tables as well."
https://cran.r-project.org/web/packages/tidyimpute/tidyimpute.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With