Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NA values with 999 in R subsetted by ID

Tags:

r

dplyr

subset

I have created the following dataframe in R

    df<-data.frame("ID"=c("A", "A", "A", "B", "B", "B"),"II"=c(NA, NA, 
    NA,1,2,3),"JJ"=c(1,2,3, NA, NA, NA), 'KK'=c(1,2,NA, NA, 5,6))

The dataframe generated is as follows

  ID II JJ KK
   A NA  1  1
   A NA  2  2
   A NA  3 NA
   B  1 NA NA
   B  2 NA  5
   B  3 NA  6

I want to subset the dataframe by the ID values and replace all columns that contain only NA values with 999. the result should be as folows

 ID  II  JJ KK
  A 999   1  1
  A 999   2  2
  A 999   3 NA
  B   1 999 NA
  B   2 999  5
  B   3 999  6

I have tried this code

   library(dplyr)
   df%>%
   group_by(ID)%>%
   select(II, JJ,KK)%>%
   mutate_if(df[, colSums(is.na(df)) == nrow(df)]<999)

I am getting the following error

 Adding missing grouping variables: `ID`
 Error in tbl_if_vars(.tbl, .p, .env, ..., .include_group_vars = 
 .include_group_vars) : 
 length(.p) == length(tibble_vars) is not TRUE

I request someone to take a look. I am unable to figure it out. Not essential for dplyr based solution

like image 993
Raghavan vmvs Avatar asked Oct 15 '25 14:10

Raghavan vmvs


2 Answers

We can use all to capture groups with all NAs. Since we want to replace the NAs in all columns, then we can use mutate_all, where the funs (i.e. function to be applied to all the columns) is a simple replace() of the groups where all() values are NA. The replacement is 999.

library(tidyverse)

df %>% 
 group_by(ID) %>% 
 mutate_all(funs(replace(., all(is.na(.)), 999)))

which gives,

# A tibble: 6 x 4
# Groups:   ID [2]
  ID       II    JJ    KK
  <fct> <dbl> <dbl> <dbl>
1 A       999     1     1
2 A       999     2     2
3 A       999     3    NA
4 B         1   999    NA
5 B         2   999     5
6 B         3   999     6
like image 137
Sotos Avatar answered Oct 17 '25 05:10

Sotos


To chip in a data.table solution as well:

library(data.table)
setDT(df)
df[, lapply(.SD, 
            function(col) if (all(is.na(col))) 999 else col), 
     by = ID]
#    ID  II  JJ KK
# 1:  A 999   1  1
# 2:  A 999   2  2
# 3:  A 999   3 NA
# 4:  B   1 999 NA
# 5:  B   2 999  5
# 6:  B   3 999  6

We loop through all non ID columns with lapply and replace them by 999 if all of the are NA.

like image 29
thothal Avatar answered Oct 17 '25 05:10

thothal