Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching values based on group ID

Tags:

r

match

Suppose I have the following data frame ( the actual one represents very large dataset)

df<- structure(list(x = c(1, 1, 1, 2, 2, 3, 3, 3), y = structure(c(1L, 
6L, NA, 2L, 4L, 3L, 7L, 5L), .Label = c("all", "fall", "hello", 
"hi", "me", "non", "you"), class = "factor"), z = structure(c(5L, 
NA, 4L, 2L, 1L, 6L, 3L, 4L), .Label = c("fall", "hi", "me", "mom", 
"non", "you"), class = "factor")), .Names = c("x", "y", "z"), row.names = c(NA, 
-8L), class = "data.frame")

Which looks like

>df
  x     y    z
1 1   all  non
2 1   non <NA>
3 1  <NA>  mom
4 2  fall   hi
5 2    hi fall
6 3 hello  you
7 3   you   me
8 3    me  mom

What I am trying to do is to count the number of matched values in each group of x (1,2, or 3). For example, the group number 1 has one matched values which is "non" (the NA should be ignored). The desired output looks like:

  x    n
1 1    1
2 2    2
3 3    2

Tried to think in a way of doing this rather than for-loop as I have a large dataset but couldn't find my way through.

like image 508
mallet Avatar asked Jul 03 '15 00:07

mallet


3 Answers

using dplyr:

library(dplyr)

df %>% group_by(x) %>%
       summarise(n = sum(y %in% na.omit(z)))
like image 91
jeremycg Avatar answered Nov 06 '22 13:11

jeremycg


Just for nightly fun I've tried a base R solution which of course is ugly as hell.

ind <- by(df, df$x, function(x) which(na.omit(x[["y"]]) %in% na.omit(df[["z"]])))
sm <- lapply(ind, length)
cbind(unique(df$x), sm)
sm
1 1 1 
2 2 2 
3 3 2 

Another base R approach, with less code (and with less ugliness I hope):

ind <- by(df, df$x, function(x) sum(na.omit(x[["y"]]) %in% na.omit(x[["z"]])))
cbind(unique(df$x), ind)
    ind
1 1   1
2 2   2
3 3   2
like image 24
SabDeM Avatar answered Nov 06 '22 13:11

SabDeM


Here's a solution using by() and match():

do.call(rbind,by(df,df$x,function(g) c(x=g$x[1],n=sum(!is.na(match(g$y,g$z,inc=NA))))));
##   x n
## 1 1 1
## 2 2 2
## 3 3 2
like image 3
bgoldst Avatar answered Nov 06 '22 12:11

bgoldst