I have created the following dataframe in R
df<-data.frame("ID"=c("A", "A", "A", "B", "B", "B"),"II"=c(NA, NA,
NA,1,2,3),"JJ"=c(1,2,3, NA, NA, NA), 'KK'=c(1,2,NA, NA, 5,6))
The dataframe generated is as follows
ID II JJ KK
A NA 1 1
A NA 2 2
A NA 3 NA
B 1 NA NA
B 2 NA 5
B 3 NA 6
I want to subset the dataframe by the ID values and replace all columns that contain only NA values with 999. the result should be as folows
ID II JJ KK
A 999 1 1
A 999 2 2
A 999 3 NA
B 1 999 NA
B 2 999 5
B 3 999 6
I have tried this code
library(dplyr)
df%>%
group_by(ID)%>%
select(II, JJ,KK)%>%
mutate_if(df[, colSums(is.na(df)) == nrow(df)]<999)
I am getting the following error
Adding missing grouping variables: `ID`
Error in tbl_if_vars(.tbl, .p, .env, ..., .include_group_vars =
.include_group_vars) :
length(.p) == length(tibble_vars) is not TRUE
I request someone to take a look. I am unable to figure it out. Not essential for dplyr based solution
We can use all to capture groups with all NAs. Since we want to replace the NAs in all columns, then we can use mutate_all, where the funs (i.e. function to be applied to all the columns) is a simple replace() of the groups where all() values are NA. The replacement is 999.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate_all(funs(replace(., all(is.na(.)), 999)))
which gives,
# A tibble: 6 x 4 # Groups: ID [2] ID II JJ KK <fct> <dbl> <dbl> <dbl> 1 A 999 1 1 2 A 999 2 2 3 A 999 3 NA 4 B 1 999 NA 5 B 2 999 5 6 B 3 999 6
To chip in a data.table solution as well:
library(data.table)
setDT(df)
df[, lapply(.SD,
function(col) if (all(is.na(col))) 999 else col),
by = ID]
# ID II JJ KK
# 1: A 999 1 1
# 2: A 999 2 2
# 3: A 999 3 NA
# 4: B 1 999 NA
# 5: B 2 999 5
# 6: B 3 999 6
We loop through all non ID columns with lapply and replace them by 999 if all of the are NA.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With