Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R create ID within a group [duplicate]

Tags:

r

I have the following dataset:

df<-structure(list(IDFAM = c("2010 7599 2996 1", "2010 7599 3071 1", 
"2010 7599 3071 1", "2010 7599 3660 1", "2010 7599 4736 1", "2010 7599 6235 1", 
"2010 7599 6299 1", "2010 7599 9903 1", "2010 7599 11013 1", 
"2010 7599 11778 1", "2010 7599 11778 1", "2010 7599 12248 1", 
"2010 7599 13127 1", "2010 7599 14261 1", "2010 7599 16280 1", 
"2010 7599 16280 1", "2010 7599 16280 1", "2010 7599 16280 1", 
"2010 7599 16280 1", "2010 7599 17382 1"), AGED = c(45L, 47L, 
24L, 46L, 46L, 44L, 43L, 43L, 43L, 16L, 43L, 46L, 44L, 47L, 43L, 
16L, 20L, 18L, 18L, 43L)), .Names = c("IDFAM", "AGED"), row.names = c("5614", 
"5748", "5753", "6864", "8894", "11761", "11884", "18738", "20896", 
"22351", "22353", "23267", "24939", "27072", "30946", "30947", 
"30949", "30950", "30952", "33034"), class = "data.frame")

I would like to assign an ID to each observation having the same IDFAM value ranging from 1 to n, where n is the number of observations with the same value of IDFAM. This would result in the following table:

IDFAM              AGED     ID
2010 7599 2996 1    45       1
2010 7599 3071 1    47       1
2010 7599 3071 1    24       2
2010 7599 3660 1    46       1
2010 7599 4736 1    46       1
2010 7599 6235 1    44       1
2010 7599 6299 1    43       1
2010 7599 9903 1    43       1
2010 7599 11013 1   43       1
2010 7599 11778 1   16       1
2010 7599 11778 1   43       2
2010 7599 12248 1   46       1
2010 7599 13127 1   44       1
2010 7599 14261 1   47       1
2010 7599 16280 1   43       1
2010 7599 16280 1   16       2
2010 7599 16280 1   20       3
2010 7599 16280 1   18       4
2010 7599 16280 1   18       5
2010 7599 17382 1   43       1

How can I do this ? Thanks.

like image 215
user2568648 Avatar asked Apr 21 '14 12:04

user2568648


2 Answers

There are several ways.

In base R, use ave:

with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
#  [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1

With the "data.table" package, use sequence(.N):

library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]

With the "dplyr" package, try:

df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))

or (as recommended by Hadley in the comments):

df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))

Update

Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID) in my "splitstackshape" package. It is based on the "data.table" approach above.

library(splitstackshape)
getanID(df, id.vars = "IDFAM")
#                 IDFAM AGED .id
#  1:  2010 7599 2996 1   45   1
#  2:  2010 7599 3071 1   47   1
#  3:  2010 7599 3071 1   24   2
#  4:  2010 7599 3660 1   46   1
#  5:  2010 7599 4736 1   46   1
#  6:  2010 7599 6235 1   44   1
#  7:  2010 7599 6299 1   43   1
#  8:  2010 7599 9903 1   43   1
#  9: 2010 7599 11013 1   43   1
# 10: 2010 7599 11778 1   16   1
# 11: 2010 7599 11778 1   43   2
# 12: 2010 7599 12248 1   46   1
# 13: 2010 7599 13127 1   44   1
# 14: 2010 7599 14261 1   47   1
# 15: 2010 7599 16280 1   43   1
# 16: 2010 7599 16280 1   16   2
# 17: 2010 7599 16280 1   20   3
# 18: 2010 7599 16280 1   18   4
# 19: 2010 7599 16280 1   18   5
# 20: 2010 7599 17382 1   43   1
like image 199
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 27 '22 02:09

A5C1D2H2I1M1N2O1R2T1


With dplyr 0.5 you can use the group_indices function. Although it do not support mutate, the following approach is straightforward:

df$id <- df %>% group_indices(IDFAM)
like image 43
Rodrigo Remedio Avatar answered Sep 23 '22 02:09

Rodrigo Remedio