Sample data
df = structure(list(class = structure(c(4L, 1L, 1L, 3L, 2L), .Label = c("apple",
"berry", "grape", "orange"), class = "factor"), value = c(NA,
NA, 1, 1, NA)), .Names = c("class", "value"), row.names = c(NA,
-5L), class = "data.frame")
looks like
class value
1 orange NA
2 apple NA
3 apple 1
4 grape 1
5 berry NA
How to remove row with NA in a group only if the group has another non NA value
desired ouput
class value
1 orange NA
2 apple 1
3 grape 1
4 berry NA
This is doable in three steps using subset and merge. I am interested in a data.table
approach
Try dplyr
. It yields simpler code and is blazingly fast, even for large data frames:
df %>%
group_by(class) %>%
filter(!(is.na(value) & sum(!is.na(value)) > 0)) %>%
ungroup
The ungroup bit at the end is only so that you don't end up with a grouped data frame (dplyr::tbl, actually).
Here is a different data.table
approach:
setkey(df,class)
df[!is.na(value)][J(unique(df$class))]
# class value
# 1: apple 1
# 2: berry NA
# 3: grape 1
# 4: orange NA
This is thanks to the default action nomatch=NA
. Type ?data.table
into the console for details.
We could use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
). Grouped by 'class', we check with an if/else
condition about occurrence of 'NA' elements in the 'value' and subset with .SD
library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD[!is.na(value)] else .SD , by = class]
# class value
#1: orange NA
#2: apple 1
#3: grape 1
#4: berry NA
Or we can change the condition from any
to all
by slightly modifying the condition
setDT(df)[, if(all(is.na(value))) .SD else .SD[!is.na(value)], by = class]
# class value
#1: orange NA
#2: apple 1
#3: grape 1
#4: berry NA
Or we get the row index (.I
) and then subset the dataset.
indx <- setDT(df)[, if(any(!is.na(value))) .I[!is.na(value)] else .I, class]$V1
df[indx]
You could make a temp variable of all the classes with NAs then take out all NAs and add back any classes that were entirely removed.
df<-setDT(df)
temp<-df[is.na(value),list(class=unique(class), value)]
df<-df[!is.na(value)]
df<-rbindlist(list(df, temp[!class %in% df[,class]]))
rm(temp)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With