Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent for Stata's egen group() function

Consider the following dataset:

df = data.frame(id = c(1,1,1,2,2,2,3,3,3), 
                time = c(1,2,3,1,2,3,1,2,3), 
                x = c(8,8,9,7,7,7,7,7,8), 
                id_x = c(1,1,2,3,3,3,4,4,5))

I want to compute id_x which identifies each unique combination of variables id and x (preferably using dplyr).

In Stata, I can do the following:

Stata
clear

input id time x
1 1 8
1 2 8
1 3 9
2 1 7
2 2 7
2 3 7
3 1 7
3 2 7
3 3 8
end

egen id_x = group(id, x)

list, separator(0)

     +----------------------+
     | id   time   x   id_x |
     |----------------------|
  1. |  1      1   8      1 |
  2. |  1      2   8      1 |
  3. |  1      3   9      2 |
  4. |  2      1   7      3 |
  5. |  2      2   7      3 |
  6. |  2      3   7      3 |
  7. |  3      1   7      4 |
  8. |  3      2   7      4 |
  9. |  3      3   8      5 |
     +----------------------+
like image 402
safex Avatar asked Jun 21 '19 20:06

safex


People also ask

What does egen group do in Stata?

egen creates a new variable of the optionally specified storage type equal to the given function based on arguments of that function.

What is the Egen command in Stata?

The Stata command egen, which stands for extended generation, is used to create variables that require some additional function in order to be generated. Examples of these function include taking the mean, discretizing a continuous variable, and counting how many from a set of variables have missing values.

What does total command in Stata do?

total produces estimates of totals, along with standard errors.


2 Answers

We can use dplyr::group_indices:

library(dplyr)

#df1 %>% mutate(id_xx = group_indices(.,id,x))
df1 %>% group_by(id,x) %>% mutate(id_xx = group_indices())
#> # A tibble: 9 x 5
#> # Groups:   id, x [5]
#>      id  time     x  id_x id_xx
#>   <dbl> <dbl> <dbl> <dbl> <int>
#> 1     1     1     8     1     1
#> 2     1     2     8     1     1
#> 3     1     3     9     2     2
#> 4     2     1     7     3     3
#> 5     2     2     7     3     3
#> 6     2     3     7     3     3
#> 7     3     1     7     4     4
#> 8     3     2     7     4     4
#> 9     3     3     8     5     5

Data:

df1 <-  data.frame(id = c(1,1,1,2,2,2,3,3,3), 
                time = c(1,2,3,1,2,3,1,2,3), 
                x = c(8,8,9,7,7,7,7,7,8), 
                id_x = c(1,1,2,3,3,3,4,4,5))
like image 182
M-- Avatar answered Oct 26 '22 15:10

M--


While M-- answer was completely correct answer at the time of writing, dplyr has deprecated group_indices(), so the code is now

df1 %>% group_by(complex, palliative) %>% mutate(cplx_pal = cur_group_id())
like image 43
Brent Avatar answered Oct 26 '22 13:10

Brent