Consider the following dataset:
df = data.frame(id = c(1,1,1,2,2,2,3,3,3),
time = c(1,2,3,1,2,3,1,2,3),
x = c(8,8,9,7,7,7,7,7,8),
id_x = c(1,1,2,3,3,3,4,4,5))
I want to compute id_x
which identifies each unique combination of variables id
and x
(preferably using dplyr
).
In Stata, I can do the following:
Stata
clear
input id time x
1 1 8
1 2 8
1 3 9
2 1 7
2 2 7
2 3 7
3 1 7
3 2 7
3 3 8
end
egen id_x = group(id, x)
list, separator(0)
+----------------------+
| id time x id_x |
|----------------------|
1. | 1 1 8 1 |
2. | 1 2 8 1 |
3. | 1 3 9 2 |
4. | 2 1 7 3 |
5. | 2 2 7 3 |
6. | 2 3 7 3 |
7. | 3 1 7 4 |
8. | 3 2 7 4 |
9. | 3 3 8 5 |
+----------------------+
egen creates a new variable of the optionally specified storage type equal to the given function based on arguments of that function.
The Stata command egen, which stands for extended generation, is used to create variables that require some additional function in order to be generated. Examples of these function include taking the mean, discretizing a continuous variable, and counting how many from a set of variables have missing values.
total produces estimates of totals, along with standard errors.
We can use dplyr::group_indices
:
library(dplyr)
#df1 %>% mutate(id_xx = group_indices(.,id,x))
df1 %>% group_by(id,x) %>% mutate(id_xx = group_indices())
#> # A tibble: 9 x 5
#> # Groups: id, x [5]
#> id time x id_x id_xx
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 1 8 1 1
#> 2 1 2 8 1 1
#> 3 1 3 9 2 2
#> 4 2 1 7 3 3
#> 5 2 2 7 3 3
#> 6 2 3 7 3 3
#> 7 3 1 7 4 4
#> 8 3 2 7 4 4
#> 9 3 3 8 5 5
df1 <- data.frame(id = c(1,1,1,2,2,2,3,3,3),
time = c(1,2,3,1,2,3,1,2,3),
x = c(8,8,9,7,7,7,7,7,8),
id_x = c(1,1,2,3,3,3,4,4,5))
While M-- answer was completely correct answer at the time of writing, dplyr
has deprecated group_indices()
, so the code is now
df1 %>% group_by(complex, palliative) %>% mutate(cplx_pal = cur_group_id())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With