This is my first post and I'm new to programming and R.
I'm trying to create a new column to mark or flag sequentially duplicated values in a separate column.
df <- c(2,2,2,2,3,4,3,4,3,4,2,3,7,7,7))
Using the duplicated function returns the following:
data.frame(value = df, flag = duplicated(df))
value flag
1 2 FALSE
2 2 TRUE
3 2 TRUE
4 2 TRUE
5 3 FALSE
6 4 FALSE
7 3 TRUE
8 4 TRUE
9 3 TRUE
10 4 TRUE
11 2 TRUE
12 3 TRUE
13 7 FALSE
14 7 TRUE
15 7 TRUE
What I'd like is:
value flag
1 2 TRUE
2 2 TRUE
3 2 TRUE
4 2 TRUE
5 3 FALSE
6 4 FALSE
7 3 FALSE
8 4 FALSE
9 3 FALSE
10 4 FALSE
11 2 FALSE
12 3 FALSE
13 7 TRUE
14 7 TRUE
15 7 TRUE
My data set has over 2 million observations, so ideally the solution would be efficient.
Thank you , John
rle
will get you what you are after in combination with rep
rl <- rle( df )
rep( rl$lengths != 1 , times = rl$lengths )
# [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
# [15] TRUE
And I believe rle
is fairly efficient.
Timing (MBP late 2008) on a 2e6 length vector:
system.time({ rl <- rle( df )
res <- rep( rl$lengths != 1 , times = rl$lengths )
})
# user system elapsed
# 0.449 0.106 0.559
Since you have more than 2 millions I recommand you really to switch to data.table
. Here My solution using rle
similar to @Simon one, I just write its data.table
version. I believe that is not always obvious especially for beginners(like me under data.table).
library(data.table)
set.seed(1234)
dd <- sample(1:20, 2e+06, rep = TRUE)
DT <- data.table(dd)
system.time(DT[, `:=`(grp2, {
dd.rle = rle(dd) ## store rle to not call it twice
rep(dd.rle$lengths > 1, times = dd.rle$lengths)
})])
## user system elapsed
## 1.17 0.06 1.28
## user system elapsed <- rle twice
## 1.69 0.11 1.86
## dd grp2
## 1e+00: 3 FALSE
## 2e+00: 13 TRUE
## 3e+00: 13 TRUE
## 4e+00: 13 TRUE
## 5e+00: 18 FALSE
## ---
## 2e+06: 6 FALSE
## 2e+06: 5 FALSE
## 2e+06: 4 FALSE
## 2e+06: 10 FALSE
## 2e+06: 13 FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With