Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove group from data.frame if at least one group member meets condition

Tags:

I have a data.frame where I'd like to remove entire groups if any of their members meets a condition.

In this first example, if the values are numbers and the condition is NA the code below works.

df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1,  1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3,  3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA,  -10L), class = "data.frame")  ans <- ddply(df, . (group), summarize, code=mean(world)) ans$code[is.na(ans$code)] <- 0 ans2 <- merge(df,ans) final.ans <- ans2[ans2$code !=0,] 

However, this ddply maneuver with the NA values will not work if the condition is something other than "NA", or if the value are non-numeric.

For example, if I wanted to remove groups which have one or more rows with a world value of AF (as in the data frame below) this ddply trick would not work.

df2 <-structure(list(world = structure(c(1L, 2L, 3L, 3L, 3L, 5L, 1L,  4L, 2L, 4L), .Label = c("AB", "AC", "AD", "AE", "AF"), class = "factor"),      place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1,      1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place",  "group"), row.names = c(NA, -10L), class = "data.frame") 

I can envision a for-loop where for each group the value of each member is checked, and if the condition is met a code column could be populated, and then a subset could me made based on that code.

But, perhaps there is a vectorized, r way to do this?

like image 343
nofunsally Avatar asked Jul 27 '15 19:07

nofunsally


2 Answers

Try

library(dplyr) df2 %>%   group_by(group) %>%   filter(!any(world == "AF")) 

Or as per metionned by @akrun:

setDT(df2)[, if(!any(world == "AF")) .SD, group]

Or

setDT(df2)[, if(all(world != "AF")) .SD, group]

Which gives:

#Source: local data frame [7 x 3] #Groups: group # #  world place group #1    AB     1     1 #2    AC     1     1 #3    AD     2     1 #4    AB     1     3 #5    AE     2     3 #6    AC     3     3 #7    AE     1     3 
like image 134
Steven Beaupré Avatar answered Sep 28 '22 09:09

Steven Beaupré


alternate data.table solution:

setDT(df2) df2[!(group %in% df2[world == "AF",group])] 

gives:

   world place group 1:    AB     1     1 2:    AC     1     1 3:    AD     2     1 4:    AB     1     3 5:    AE     2     3 6:    AC     3     3 7:    AE     1     3 

Using keys we can be a bit faster:

setkey(df2,group)  df2[!J((df2[world == "AF",group]))] 
like image 41
Chris Avatar answered Sep 28 '22 09:09

Chris