I have test data that looks like this:
Group Value
1 a 1
2 a 2
3 a 3
4 a 4
5 b 5
6 b 2
7 b 3
8 c 6
9 c 7
10 c 8
11 c 3
12 c 6
13 d 9
14 d 10
15 e 9
I am trying to create a vectorized approach, preferably using tidyverse
tools that will create an additional column that notes if the Value is present in the previous grouping. Here is an example of how this would look like:
Group Value In_Last_Group
1 a 1 FALSE
2 a 2 FALSE
3 a 3 FALSE
4 a 4 FALSE
5 b 5 FALSE
6 b 2 TRUE
7 b 3 TRUE
8 c 6 FALSE
9 c 7 FALSE
10 c 8 FALSE
11 c 3 TRUE
12 c 5 TRUE
13 d 9 FALSE
14 d 10 FALSE
15 e 9 TRUE
I have a way to do this using a standard for loop, but I have a large dataset and I believe it would be much faster if it was vectorized. Any help would be appreciated.
Here is the dput
of the test data:
structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d",
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7,
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA,
-15L), class = "data.frame")
We can nest
after grouping by 'Group', then remove the first and last elements of the 'data' column, use map2
to do a comparison of corresponding elements and then append with the FALSE
elements for the first group
library(dplyr)
library(purrr)
df2 <- df1 %>%
group_by(Group) %>%
nest
flag <- map2(df2$data[-1], df2$data[-nrow(df2)], ~
.x$Value %in% .y$Value) %>%
unlist
df1$Last_Group <- c(rep(FALSE, nrow(df2$data[[1]])), flag)
You can use a join to lookup values in the previous group to see if those values exists. It should be faster than looping through the groups. I am not familiar with tidyverse
but here is an implementation in data.table
(which should also be faster than tidyverse
if your data is large enough):
library(data.table)
setDT(DF)
DF[, c("g", "pg") := .(r <- rleid(Group), r - 1L)]
DF[, ilg := FALSE][DF, on=.(pg=g, Value), ilg := TRUE]
output (note that there is a typo for Value in row 12 of OP's desired output):
Group Value g pg ilg
1: a 1 1 0 FALSE
2: a 2 1 0 FALSE
3: a 3 1 0 FALSE
4: a 4 1 0 FALSE
5: b 5 2 1 FALSE
6: b 2 2 1 TRUE
7: b 3 2 1 TRUE
8: c 6 3 2 FALSE
9: c 7 3 2 FALSE
10: c 8 3 2 FALSE
11: c 3 3 2 TRUE
12: c 6 3 2 FALSE
13: d 9 4 3 FALSE
14: d 10 4 3 FALSE
15: e 9 5 4 TRUE
data:
DF <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d",
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7,
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA,
-15L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With