I have the following data frame:
> test = data.frame(A = sample(1:5, 10, replace = T)) %>% arrange(A)
> test
A
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 4
9 4
10 5
I now want every row to have an ID that is only incremented when the value of A changes. This is what I have tried:
> test = test %>% mutate(id = as.numeric(rownames(test))) %>% group_by(A) %>% mutate(id = min(id))
> test
A id
(int) (dbl)
1 1 1
2 1 1
3 1 1
4 2 4
5 2 4
6 2 4
7 2 4
8 4 8
9 4 8
10 5 10
However, I would like to get the following:
A id
(int) (dbl)
1 1 1
2 1 1
3 1 1
4 2 2
5 2 2
6 2 2
7 2 2
8 4 3
9 4 3
10 5 4
library(dplyr)
test %>% mutate(id = dense_rank(A))
One compact option would be using data.table
. Convert the 'data.frame' to 'data.table' (setDT(test)
), grouped by 'A', we assign (:=
) .GRP
as the new 'id' column. The .GRP
will be a sequence of values for each unique value in 'A'.
library(data.table)
setDT(test)[, id:=.GRP, A]
In case the value of 'A' changes like 3, 3, 4, 3
and we want 1, 1, 2, 3
forthe 'id'
setDT(test)[, id:= rleid(A)]
Or we convert 'A' to factor
class and then coerce it back to numeric/integer
library(dplyr)
test %>%
mutate(id = as.integer(factor(A)))
Or we can match
'A' with the unique
values in 'A'.
test %>%
mutate(id = match(A, unique(A)))
Or from the dplyr
version > 0.4.0
, we can use group_indices
(it is in the dupe link)
test %>%
mutate(id=group_indices_(test, .dots= "A"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With