Create Unique ID based on a column and calculate the mean based on the generated ID in R

Question

I am trying to generate unique ID for each event/episode and then wanted to calculate the mean of a column based on the generated ID.

Here is an example:

Event	Val
A	3
A	5
A	1
B	8
B	2
C	1
C	3
A	2
A	5

First, I would like to create an ID column as below,

Event	Val	ID
A	3	A1
A	5	A1
A	1	A1
B	8	B1
B	2	B1
C	1	C1
C	3	C1
A	2	A2
A	5	A2

Second, I would like to calculate the mean of 'Val' based on the generated ID column.

Thanks for your help and suggestions.

I tried 'cumsum' and 'duplicated' function, but that tend to generate continous ID numbers.

ismirsehregal · Accepted Answer

A data.table approach:

library(data.table)

DT <- data.table(
    Event = c("A", "A", "A", "B", "B", "C", "C", "A", "A"),
    Val = c(3L, 5L, 1L, 8L, 2L, 1L, 3L, 2L, 5L)
  )

DT[, ID := rleid(Event)][, ID := rleid(ID), by = Event][, ID := paste0(Event, ID)][, mean := mean(Val), by = ID]

> DT
   Event Val ID mean
1:     A   3 A1  3.0
2:     A   5 A1  3.0
3:     A   1 A1  3.0
4:     B   8 B1  5.0
5:     B   2 B1  5.0
6:     C   1 C1  2.0
7:     C   3 C1  2.0
8:     A   2 A2  3.5
9:     A   5 A2  3.5

Maël · Answer

A similar approach than @ismirsehregal but with dplyr. Note that consecutive_id was introduced with dplyr 1.1.0, the latest dplyr update.

library(dplyr) #1.1.0+ 
df %>% 
  mutate(id = consecutive_id(Event)) %>% 
  mutate(id = paste0(Event, consecutive_id(id)), .by = Event) %>%
  mutate(mean = mean(Val), .by = id)

  Event Val id mean
1     A   3 A1  3.0
2     A   5 A1  3.0
3     A   1 A1  3.0
4     B   8 B1  5.0
5     B   2 B1  5.0
6     C   1 C1  2.0
7     C   3 C1  2.0
8     A   2 A2  3.5
9     A   5 A2  3.5

And another approach based on cumsum and non-consecutive row_number:

df %>% 
  mutate(id = row_number()) %>% 
  mutate(id = paste0(Event, cumsum(c(0, diff(id)) != 1)), .by = Event) %>% 
  mutate(mean = mean(Val), .by = id)

Create Unique ID based on a column and calculate the mean based on the generated ID in R

Tags:

uniqueidentifier

r

dplyr

Karthik

2 Answers

ismirsehregal

Maël

Recent Activity

Donate For Us

Create Unique ID based on a column and calculate the mean based on the generated ID in R

Tags:

uniqueidentifier

r

dplyr

Karthik

2 Answers

ismirsehregal

Maël

Related questions

Recent Activity

Donate For Us