Numbering rows within groups in a data frame

Q: What is the function to set row numbers for data frames?

`. rowNamesDF<-` is a (non-generic replacement) function to set row names for data frames, with extra argument make.

People also ask

How to number rows by group in R?

Method 1: Using ave() function Call the ave() function, which is a base function of the R language, and pass the required parameters to this function and this process will be leading to the numbering rows within the group of the given dataframe in the R programming language.

How to number rows in r dataframe?

To get number of rows in R Data Frame, call the nrow() function and pass the data frame as argument to this function. nrow() is a function in R base package.

How do I assign row numbers in R?

To Generate Row number to the dataframe in R we will be using seq.int() function. Seq.int() function along with nrow() is used to generate row number to the dataframe in R. We can also use row_number() function to generate row index.

What is the function to set row numbers for data frames?

`. rowNamesDF<-` is a (non-generic replacement) function to set row names for data frames, with extra argument make.

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

For making this r-faq question more complete, a base R alternative with sequence and rle:

df$num <- sequence(rle(df$cat)$lengths)

which gives the intended result:

> df
   cat        val num
4  aaa 0.05638315   1
2  aaa 0.25767250   2
1  aaa 0.30776611   3
5  aaa 0.46854928   4
3  aaa 0.55232243   5
10 bbb 0.17026205   1
8  bbb 0.37032054   2
6  bbb 0.48377074   3
9  bbb 0.54655860   4
7  bbb 0.81240262   5
13 ccc 0.28035384   1
14 ccc 0.39848790   2
11 ccc 0.62499648   3
15 ccc 0.76255108   4
12 ccc 0.88216552   5

If df$cat is a factor variable, you need to wrap it in as.character first:

df$num <- sequence(rle(as.character(df$cat))$lengths)

Here is a small improvement trick that allows sort 'val' inside the groups:

# 1. Data set
set.seed(100)
df <- data.frame(
  cat = c(rep("aaa", 5), rep("ccc", 5), rep("bbb", 5)), 
  val = runif(15))             

# 2. 'dplyr' approach
df %>% 
  arrange(cat, val) %>% 
  group_by(cat) %>% 
  mutate(id = row_number())

Another dplyr possibility could be:

df %>%
 group_by(cat) %>%
 mutate(num = 1:n())

   cat      val   num
   <fct>  <dbl> <int>
 1 aaa   0.0564     1
 2 aaa   0.258      2
 3 aaa   0.308      3
 4 aaa   0.469      4
 5 aaa   0.552      5
 6 bbb   0.170      1
 7 bbb   0.370      2
 8 bbb   0.484      3
 9 bbb   0.547      4
10 bbb   0.812      5
11 ccc   0.280      1
12 ccc   0.398      2
13 ccc   0.625      3
14 ccc   0.763      4
15 ccc   0.882      5

I would like to add a data.table variant using the rank() function which provides the additional possibility to change the ordering and thus makes it a bit more flexible than the seq_len() solution and is pretty similar to row_number functions in RDBMS.

# Variant with ascending ordering
library(data.table)
dt <- data.table(df)
dt[, .( val
   , num = rank(val))
    , by = list(cat)][order(cat, num),]

    cat        val num
 1: aaa 0.05638315   1
 2: aaa 0.25767250   2
 3: aaa 0.30776611   3
 4: aaa 0.46854928   4
 5: aaa 0.55232243   5
 6: bbb 0.17026205   1
 7: bbb 0.37032054   2
 8: bbb 0.48377074   3
 9: bbb 0.54655860   4
10: bbb 0.81240262   5
11: ccc 0.28035384   1
12: ccc 0.39848790   2
13: ccc 0.62499648   3
14: ccc 0.76255108   4

# Variant with descending ordering
dt[, .( val
   , num = rank(desc(val)))
    , by = list(cat)][order(cat, num),]

Edit on 2021-04-16 to make the switch between descending and ascending order more fail-safe

Here is an option using a for loop by groups rather by rows (like OP did)

for (i in unique(df$cat)) df$num[df$cat == i] <- seq_len(sum(df$cat == i))

Related questions
                            
                                How to convert a table to a data frame
                            
                                session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium
                            
                                Access lapply index names inside FUN
                            
                                Does the ternary operator exist in R?
                            
                                Convert data.frame column to a vector?
                            
                                Change size of axes title and labels in ggplot2
                            
                                Fixing a multiple warning "unknown column"
                            
                                Error: could not find function ... in R
                            
                                Relative frequencies / proportions with dplyr
                            
                                How can I handle R CMD check "no visible binding for global variable" notes when my ggplot2 syntax is sensible?
                            
                                How to prevent ifelse() from turning Date objects into numeric objects
                            
                                How to suppress warnings globally in an R Script
                            
                                Reasons for using the set.seed function
                            
                                How to use R's ellipsis feature when writing your own function?
                            
                                How to assign colors to categorical variables in ggplot2 that have stable mapping?
                            
                                Mean per group in a data.frame [duplicate]
                            
                                How to select a CRAN mirror in R
                            
                                "Correct" way to specifiy optional arguments in R functions
                            
                                Load multiple packages at once
                            
                                Speed up the loop operation in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numbering rows within groups in a data frame

Tags:

dataframe

r

r-faq

People also ask

Recent Activity

Donate For Us