I would like a function that works equivalent to cumsum but rather than adding up it counts the number of unique values so far. I could write a loop for each potential set but that seems like it could get time consuming as my dataset has millions of observations.
Example:
a <- c(1,3,2,4,1,5,2,3)
f(a)
[1] 1 2 3 4 4 5 5 5
You can try:
cumsum(!duplicated(a))
#[1] 1 2 3 4 4 5 5 5
We can try
library(zoo)
a[duplicated(a)] <- NA
a[!is.na(a)] <- seq_along(a[!is.na(a)])
na.locf(a)
#[1] 1 2 3 4 4 5 5 5
Or another option is
cumsum(ave(a, a, FUN=seq_along)==1)
#[1] 1 2 3 4 4 5 5 5
Or a compact option would be
library(splitstackshape)
getanID(a)[, cumsum(.id==1)]
#[1] 1 2 3 4 4 5 5 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With