Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a sequential number (counter) for rows within each group of a dataframe [duplicate]

Tags:

dataframe

r

How can we generate unique id numbers within each group of a dataframe? Here's some data grouped by "personid":

personid date measurement
1         x     23
1         x     32
2         y     21
3         x     23
3         z     23
3         y     23

I wish to add an id column with a unique value for each row within each subset defined by "personid", always starting with 1. This is my desired output:

personid date measurement id
1         x     23         1
1         x     32         2
2         y     21         1
3         x     23         1
3         z     23         2
3         y     23         3

I appreciate any help.

like image 585
suresh Avatar asked Aug 16 '12 22:08

suresh


People also ask

How to count within data Frame R?

The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups. The columns returned on the application of this method is a proper subset of the columns of the original dataframe.

How do you add sequential numbers in R?

The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works. That gave us every integer between (and including) 1 and 20 (an integer is a positive or negative counting number, including 0).

How do I count rows in a Dataframe in Rstudio?

R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.

How to count number of entries in group by R?

group_by() function along with n() is used to count the number of occurrences of the group in R. group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses n() function to find count of a sales.


2 Answers

Some dplyr alternatives, using convenience functions row_number and n.

library(dplyr)
df %>% group_by(personid) %>% mutate(id = row_number())
df %>% group_by(personid) %>% mutate(id = 1:n())
df %>% group_by(personid) %>% mutate(id = seq_len(n()))
df %>% group_by(personid) %>% mutate(id = seq_along(personid))

You may also use getanID from package splitstackshape. Note that the input dataset is returned as a data.table.

getanID(data = df, id.vars = "personid")
#    personid date measurement .id
# 1:        1    x          23   1
# 2:        1    x          32   2
# 3:        2    y          21   1
# 4:        3    x          23   1
# 5:        3    z          23   2
# 6:        3    y          23   3
like image 113
Henrik Avatar answered Oct 20 '22 17:10

Henrik


The misleadingly named ave() function, with argument FUN=seq_along, will accomplish this nicely -- even if your personid column is not strictly ordered.

df <- read.table(text = "personid date measurement
1         x     23
1         x     32
2         y     21
3         x     23
3         z     23
3         y     23", header=TRUE)

## First with your data.frame
ave(df$personid, df$personid, FUN=seq_along)
# [1] 1 2 1 1 2 3

## Then with another, in which personid is *not* in order
df2 <- df[c(2:6, 1),]
ave(df2$personid, df2$personid, FUN=seq_along)
# [1] 1 1 1 2 3 2
like image 32
Josh O'Brien Avatar answered Oct 20 '22 18:10

Josh O'Brien