I have created the following dataframe in R
df<-data.frame("ID"=c("A", "A", "A", "B", "B", "B"),"II"=c(NA, NA,
NA,1,2,3),"JJ"=c(1,2,3, NA, NA, NA), 'KK'=c(1,2,NA, NA, 5,6))
The dataframe generated is as follows
ID II JJ KK
A NA 1 1
A NA 2 2
A NA 3 NA
B 1 NA NA
B 2 NA 5
B 3 NA 6
I want to subset the dataframe by the ID values and replace all columns that contain only NA values with 999. the result should be as folows
ID II JJ KK
A 999 1 1
A 999 2 2
A 999 3 NA
B 1 999 NA
B 2 999 5
B 3 999 6
I have tried this code
library(dplyr)
df%>%
group_by(ID)%>%
select(II, JJ,KK)%>%
mutate_if(df[, colSums(is.na(df)) == nrow(df)]<999)
I am getting the following error
Adding missing grouping variables: `ID`
Error in tbl_if_vars(.tbl, .p, .env, ..., .include_group_vars =
.include_group_vars) :
length(.p) == length(tibble_vars) is not TRUE
I request someone to take a look. I am unable to figure it out. Not essential for dplyr based solution
We can use all
to capture groups with all NAs. Since we want to replace the NA
s in all columns, then we can use mutate_all
, where the funs
(i.e. function to be applied to all the columns) is a simple replace()
of the groups where all()
values are NA. The replacement is 999
.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate_all(funs(replace(., all(is.na(.)), 999)))
which gives,
# A tibble: 6 x 4 # Groups: ID [2] ID II JJ KK <fct> <dbl> <dbl> <dbl> 1 A 999 1 1 2 A 999 2 2 3 A 999 3 NA 4 B 1 999 NA 5 B 2 999 5 6 B 3 999 6
To chip in a data.table
solution as well:
library(data.table)
setDT(df)
df[, lapply(.SD,
function(col) if (all(is.na(col))) 999 else col),
by = ID]
# ID II JJ KK
# 1: A 999 1 1
# 2: A 999 2 2
# 3: A 999 3 NA
# 4: B 1 999 NA
# 5: B 2 999 5
# 6: B 3 999 6
We loop through all non ID
columns with lapply
and replace them by 999
if all of the are NA
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With