Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditionally determining value of column by looking at last group

Tags:

r

dplyr

tidyverse

I have test data that looks like this:

   Group Value
1      a     1
2      a     2
3      a     3
4      a     4
5      b     5
6      b     2
7      b     3
8      c     6
9      c     7
10     c     8
11     c     3
12     c     6
13     d     9
14     d    10
15     e     9

I am trying to create a vectorized approach, preferably using tidyverse tools that will create an additional column that notes if the Value is present in the previous grouping. Here is an example of how this would look like:

   Group Value In_Last_Group
1      a     1         FALSE
2      a     2         FALSE
3      a     3         FALSE
4      a     4         FALSE
5      b     5         FALSE
6      b     2          TRUE
7      b     3          TRUE
8      c     6         FALSE
9      c     7         FALSE
10     c     8         FALSE
11     c     3          TRUE
12     c     5          TRUE
13     d     9         FALSE
14     d    10         FALSE
15     e     9          TRUE

I have a way to do this using a standard for loop, but I have a large dataset and I believe it would be much faster if it was vectorized. Any help would be appreciated.

Here is the dput of the test data:

structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7, 
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA, 
-15L), class = "data.frame")
like image 524
sumshyftw Avatar asked Mar 26 '20 20:03

sumshyftw


Video Answer


2 Answers

We can nest after grouping by 'Group', then remove the first and last elements of the 'data' column, use map2 to do a comparison of corresponding elements and then append with the FALSE elements for the first group

library(dplyr)
library(purrr)
df2 <- df1 %>%
         group_by(Group) %>%
         nest

flag <-  map2(df2$data[-1], df2$data[-nrow(df2)], ~ 
      .x$Value %in% .y$Value) %>%
      unlist
df1$Last_Group <- c(rep(FALSE, nrow(df2$data[[1]])), flag)
like image 51
akrun Avatar answered Sep 24 '22 03:09

akrun


You can use a join to lookup values in the previous group to see if those values exists. It should be faster than looping through the groups. I am not familiar with tidyverse but here is an implementation in data.table (which should also be faster than tidyverse if your data is large enough):

library(data.table)
setDT(DF)
DF[, c("g", "pg") := .(r <- rleid(Group), r - 1L)]
DF[, ilg := FALSE][DF, on=.(pg=g, Value), ilg := TRUE]

output (note that there is a typo for Value in row 12 of OP's desired output):

    Group Value g pg   ilg
 1:     a     1 1  0 FALSE
 2:     a     2 1  0 FALSE
 3:     a     3 1  0 FALSE
 4:     a     4 1  0 FALSE
 5:     b     5 2  1 FALSE
 6:     b     2 2  1  TRUE
 7:     b     3 2  1  TRUE
 8:     c     6 3  2 FALSE
 9:     c     7 3  2 FALSE
10:     c     8 3  2 FALSE
11:     c     3 3  2  TRUE
12:     c     6 3  2 FALSE
13:     d     9 4  3 FALSE
14:     d    10 4  3 FALSE
15:     e     9 5  4  TRUE

data:

DF <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7, 
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA, 
-15L), class = "data.frame")
like image 22
chinsoon12 Avatar answered Sep 22 '22 03:09

chinsoon12