Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtain a Unique ID by Group in mutate pipeline

Tags:

r

dplyr

Since the new dplyr v1.0.0 update came out I noticed that the function group_indices() has the ... deprecated. I use this function a lot in my work and I like to use it in a mutate.

e.g. Using dplyr v0.8.3 I was able to do something like this very easily:

#Note that I have not run this code as I no longer have v0.8.3 on my machine.

library(dplyr) # v0.8.3
rep_data <- data.frame(
  x = c("a", "a", "a", "a", "b", "b", "b", "c"),
  y = c("v1", "v1", "v2", "v3", "v1", "v2", "v3", "v3"),
  expect_output = c(1, 1, 2, 3, 4, 5, 6, 7)
)
rep_data %>%
  mutate(expect_output2 = group_indices(x, y))

expect_output2 should give effectively the same results as expect_output.

Now that the ... are deprecated I'd like to move away from using them but I'm unsure of how to do the same thing as above.

I'm basically asking this question HERE but this question is now outdated with the new dplyr version.

When I run the code above using dplyr v1.0.0 I get the warning message:

Warning message:
The `...` argument of `group_keys()` is deprecated as of dplyr 1.0.0.
Please `group_by()` first

So I've tried to do the following

library(dplyr) # v1.0.0
rep_data %>% 
  group_by(x, y) %>% 
  mutate(expect_output3 = group_indices(.))

Which results in an error

Error: Problem with `mutate()` input `expect_output3`.
x Input `expect_output3` can't be recycled to size 2.
i Input `expect_output3` is `group_indices(.)`.
i Input `expect_output3` must be size 2 or 1, not 8.
i The error occured in group 1: x = "a", y = "v1".

Keeping group_indices out of the mutate works fine and returns the expected vector however I'd like to keep manipulating my data in a pipe chain and not have to assign it like I've seen on other questions e.g. I don't want to have to do this

rep_data$expect_output3 = rep_data %>% group_by(x,y) %>% group_indices()

Is there a way to group_indices() and add this vector to my data while maintaining my pipe chain? I'm more than happy to use a different function than group_indices() however I haven't quite found one that works for my purposes yet.

Any help would be appreciated. Thanks!

like image 996
jackbio Avatar asked Mar 03 '23 07:03

jackbio


1 Answers

The error cannot be reproduced in dplyr 1.0.0, but the group_indices is getting deprecated, instead use cur_group_id

library(dplyr)# 1.0.0
rep_data %>% 
     group_by(x, y) %>% 
     mutate(expect_output2 =cur_group_id())
# A tibble: 8 x 4
# Groups:   x, y [7]
#  x     y     expect_output expect_output2
#  <chr> <chr>         <dbl>          <int>
#1 a     v1                1              1
#2 a     v1                1              1
#3 a     v2                2              2
#4 a     v3                3              3
#5 b     v1                4              4
#6 b     v2                5              5
#7 b     v3                6              6
#8 c     v3                7              7
like image 72
akrun Avatar answered Mar 11 '23 20:03

akrun