How do I impute missing variables in R using dplyr?

Question

I would like to impute missing values for a variable given the existing values. In var2, we notice that there are a lot of NAs.

If any 2 ids are the same, then their values for var2 are the same.
If the id has no values for var2, like in the case of id==2, then we just output as NA.

It should look from df_old to df_new.

 df_old<- read.table(header = TRUE, text = "
 id  var1  var2 
  1  A       12    
  1  B       NA    
  1  E       NA    
  2  G       NA
  2  J       NA
 ")

df_new<- read.table(header = TRUE, text = "
id  var1  var2 
 1  A       12    
 1  B       12    
 1  E       12    
 2  G       NA
 2  J       NA
")

I tried take:

df_new<-df_old %>%
        group_by(id) %>%
        mutate(var2=na.omit(var2))

I believe it doesn't work because of the second case. I was also wondering if using ifelse would be okay. Need help thanks!

erc · Accepted Answer

If there is only one var2 value per id available you could simply do:

df_old %>%
  group_by(id) %>%
  mutate(var2 = min(var2, na.rm = TRUE))

Source: local data frame [5 x 3]
Groups: id [2]

     id   var1  var2
  <int> <fctr> <int>
1     1      A    12
2     1      B    12
3     1      E    12
4     2      G    NA
5     2      J    NA

Another option would be:

mutate(var2 = var2[1])

akrun · Answer

We can use data.table, but unlike dplyr, for groups that have all NA, we have to specify NA to return or else it will give Inf

library(data.table)
setDT(df_old)[, var2 := if(any(!is.na(var2))) min(var2, na.rm = TRUE) 
            else NA_integer_, by = id]
df_old    
#    id var1 var2
#1:  1    A   12
#2:  1    B   12
#3:  1    E   12
#4:  2    G   NA
#5:  2    J   NA

juhariis · Answer

By now there is tidyimpute package available in CRAN which looks like it might do the trick

"Functions and methods for imputing missing values (NA) in tables and list patterned after the tidyverse approach of 'dplyr' and 'rlang'; works with data.tables as well."

https://cran.r-project.org/web/packages/tidyimpute/tidyimpute.pdf

How do I impute missing variables in R using dplyr?

Tags:

r

data-manipulation

dplyr

HNSKD

3 Answers

erc

akrun

juhariis

Recent Activity

Donate For Us

How do I impute missing variables in R using dplyr?

Tags:

r

data-manipulation

dplyr

HNSKD

3 Answers

erc

akrun

juhariis

Related questions

Recent Activity

Donate For Us