Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows with NA in a group, given the group contains at-least one non NA value

Tags:

r

data.table

Sample data

df = structure(list(class = structure(c(4L, 1L, 1L, 3L, 2L), .Label = c("apple", 
"berry", "grape", "orange"), class = "factor"), value = c(NA, 
NA, 1, 1, NA)), .Names = c("class", "value"), row.names = c(NA, 
-5L), class = "data.frame")

looks like

   class value
1 orange    NA
2  apple    NA
3  apple     1
4  grape     1
5  berry    NA

How to remove row with NA in a group only if the group has another non NA value

desired ouput

   class value
1 orange    NA
2  apple     1
3  grape     1
4  berry    NA

This is doable in three steps using subset and merge. I am interested in a data.table approach

like image 990
Veerendra Gadekar Avatar asked Jul 16 '15 19:07

Veerendra Gadekar


4 Answers

Try dplyr. It yields simpler code and is blazingly fast, even for large data frames:

df %>%
    group_by(class) %>%
    filter(!(is.na(value) & sum(!is.na(value)) > 0)) %>%
    ungroup

The ungroup bit at the end is only so that you don't end up with a grouped data frame (dplyr::tbl, actually).

like image 181
Felipe Gerard Avatar answered Oct 21 '22 03:10

Felipe Gerard


Here is a different data.table approach:

setkey(df,class)
df[!is.na(value)][J(unique(df$class))]

#     class value
# 1:  apple     1
# 2:  berry    NA
# 3:  grape     1
# 4: orange    NA

This is thanks to the default action nomatch=NA. Type ?data.table into the console for details.

like image 34
Frank Avatar answered Oct 21 '22 02:10

Frank


We could use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)). Grouped by 'class', we check with an if/else condition about occurrence of 'NA' elements in the 'value' and subset with .SD

library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD[!is.na(value)] else .SD , by = class]
#    class value
#1: orange    NA
#2:  apple     1
#3:  grape     1
#4:  berry    NA

Or we can change the condition from any to all by slightly modifying the condition

setDT(df)[, if(all(is.na(value))) .SD else .SD[!is.na(value)], by = class]
#    class value
#1: orange    NA
#2:  apple     1
#3:  grape     1
#4:  berry    NA

Or we get the row index (.I) and then subset the dataset.

indx <- setDT(df)[, if(any(!is.na(value))) .I[!is.na(value)] else .I, class]$V1
df[indx]
like image 40
akrun Avatar answered Oct 21 '22 03:10

akrun


You could make a temp variable of all the classes with NAs then take out all NAs and add back any classes that were entirely removed.

df<-setDT(df)
temp<-df[is.na(value),list(class=unique(class), value)]
df<-df[!is.na(value)]
df<-rbindlist(list(df, temp[!class %in% df[,class]]))
rm(temp)
like image 22
Dean MacGregor Avatar answered Oct 21 '22 03:10

Dean MacGregor