I have an unsorted vector of length N. Each element of the vector appears is present precisely twice (the vector length is an even number). I have a custom sorting algorithm, and the goal is to iterate until the vector achieves a state in which each element is adjacent to its copy.
Unsorted vector = {A,F,J,E,F,A,J,E}
A valid sorted state = {A,A,J,J,E,E,F,F}
Another valid sorted state = {J,J,A,A,F,F,E,E}
So my question lies in what is the fastest way to check if a sorted state is valid so that I can speed up my iterations? For long vectors, this will dictate most of my scaling ability.
An option involves converting the vector
(as the length
is even
and an element is present exactly twice) to a two-row matrix, get the unique
and test whether the number of rows is 1. If values duplicated are adjacent, while adding the dim
attributes with matrix
, the second row will be exactly the same as the first
f1 <- function(x)
{
nrow(unique(matrix(x, nrow = 2))) == 1
}
-testing
> v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
> v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
> v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")
> f1(v1)
[1] FALSE
> f1(v2)
[1] TRUE
> f1(v3)
[1] TRUE
Or slightly more faster
f2 <- function(x)
{
sum(duplicated(matrix(x, nrow = 2))) == 1
}
-testing
> f2(v1)
[1] FALSE
> f2(v2)
[1] TRUE
> f2(v3)
[1] TRUE
-benchmarks
#thelatemail
> f3 <- function(x) all(duplicated(x) == c(FALSE,TRUE))
#TarJae
> f4 <- function(x) {rle_obj <- rle(x); all(rle_obj$lengths > 1)}
> x1 <- rep(1:1e8, each = 2)
> system.time(f1(x1))
user system elapsed
2.649 0.456 3.111
> system.time(f2(x1))
user system elapsed
2.258 0.433 2.694
> system.time(f3(x1))
user system elapsed
9.972 1.272 11.233
> system.time(f4(x1))
user system elapsed
7.051 3.281 10.333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With