How can we generate unique id numbers within each group of a dataframe? Here's some data grouped by "personid":
personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23
I wish to add an id column with a unique value for each row within each subset defined by "personid", always starting with 1
. This is my desired output:
personid date measurement id
1 x 23 1
1 x 32 2
2 y 21 1
3 x 23 1
3 z 23 2
3 y 23 3
I appreciate any help.
The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups. The columns returned on the application of this method is a proper subset of the columns of the original dataframe.
The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works. That gave us every integer between (and including) 1 and 20 (an integer is a positive or negative counting number, including 0).
R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.
group_by() function along with n() is used to count the number of occurrences of the group in R. group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses n() function to find count of a sales.
Some dplyr
alternatives, using convenience functions row_number
and n
.
library(dplyr)
df %>% group_by(personid) %>% mutate(id = row_number())
df %>% group_by(personid) %>% mutate(id = 1:n())
df %>% group_by(personid) %>% mutate(id = seq_len(n()))
df %>% group_by(personid) %>% mutate(id = seq_along(personid))
You may also use getanID
from package splitstackshape
. Note that the input dataset is returned as a data.table
.
getanID(data = df, id.vars = "personid")
# personid date measurement .id
# 1: 1 x 23 1
# 2: 1 x 32 2
# 3: 2 y 21 1
# 4: 3 x 23 1
# 5: 3 z 23 2
# 6: 3 y 23 3
The misleadingly named ave()
function, with argument FUN=seq_along
, will accomplish this nicely -- even if your personid
column is not strictly ordered.
df <- read.table(text = "personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23", header=TRUE)
## First with your data.frame
ave(df$personid, df$personid, FUN=seq_along)
# [1] 1 2 1 1 2 3
## Then with another, in which personid is *not* in order
df2 <- df[c(2:6, 1),]
ave(df2$personid, df2$personid, FUN=seq_along)
# [1] 1 1 1 2 3 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With