I have a dataframe which contains two columns, id column id and an observation column val with observations from a-g in random order.
data.frame(
id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
val = c("a", "b", "c", "d", "e", "f", "a", "b", "b", "a", "c", "e", "d", "a", "b","c","f","g")
)
id val
1 1 a
2 1 b
3 1 c
4 1 d
5 1 e
6 2 f
7 2 a
8 2 b
9 3 b
10 3 a
11 3 c
12 3 e
13 3 d
14 4 a
15 4 b
16 4 c
17 4 f
18 4 g
I want to assign an (cumulative) identifier valid for each unique set of observations (belonging to each id) given in column val, like so:
id val valid
1 1 a 1
2 1 b 1
3 1 c 1
4 1 d 1
5 1 e 1
6 2 f 2
7 2 a 2
8 2 b 2
9 3 b 1
10 3 a 1
11 3 c 1
12 3 e 1
13 3 d 1
14 4 a 3
15 4 b 3
16 4 c 3
17 4 f 3
18 4 g 3
You can try the base R option like below
u <- with(df, by(val, id, sort))
dfout <- merge(
df,
setNames(
stack(setNames(match(u, unique(u)), names(u))),
c("valid", "id")
),
all = TRUE
)
and you will obtain
> dfout
id val valid
1 1 a 1
2 1 b 1
3 1 c 1
4 1 d 1
5 1 e 1
6 2 f 2
7 2 a 2
8 2 b 2
9 3 b 1
10 3 a 1
11 3 c 1
12 3 e 1
13 3 d 1
14 4 a 3
15 4 b 3
16 4 c 3
17 4 f 3
18 4 g 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With