I have two vectors:
vec1 <- c(0, 1, 2, 3, 4, 5, 6, 7, 9)
vec2 <- c(1, 2, 7, 5, 3, 6, 80, 4, 8)
I would like to set the same order in vec1
as it is in vec2
. For example, in vec2
the highest number (position 9) is in position 7, so I would like to put the highest number in vec1
(position 9, number 9) to position 7.
Expected output:
vec1 <- c(0, 1, 6, 4, 2, 5, 9, 3, 7)
I don't have any duplicated values in any vector.
I'm primarily interested in efficient Rcpp solutions but also anything in R is welcome.
Another baseR
option is match
vec1[match(vec2, sort(vec2))]
# [1] 0 1 6 4 2 5 9 3 7
edit
Including a benchmark with larger sample size
set.seed(42)
n <- 1e6
vec1 <- seq_len(n)
vec2 <- sample(1:1e7, size = n)
benchmarks <- bench::mark(match = vec1[match(vec2, sort(vec2))],
rank = vec1[rank(vec2)],
frank = vec1[data.table::frank(vec2)],
order_order = vec1[order(order(vec2))],
rcpp_order_order = foo(vec1, vec2),
iterations = 25)
benchmarks[ , 1:3]
Result
# A tibble: 5 x 3
# expression min median
# <bch:expr> <bch:tm> <bch:tm>
#1 match 259.8ms 322ms
#2 rank 825.9ms 876ms
#3 frank 88.6ms 134ms
#4 order_order 110.6ms 139ms
#5 rcpp_order_order 793.5ms 893ms
We can adapt the Rcpp
version of order()
from this answer (to account for the fact that you do not want to check for duplicates and adding a function to order by an order of an ordering) to make the following Rcpp
solution:
#include <Rcpp.h>
Rcpp::IntegerVector order(const Rcpp::NumericVector& x) {
return Rcpp::match(Rcpp::clone(x).sort(), x);
}
Rcpp::IntegerVector order(const Rcpp::IntegerVector& x) {
return Rcpp::match(Rcpp::clone(x).sort(), x);
}
// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector x,
const Rcpp::NumericVector y) {
return x[order(order(y))-1];
}
Then we get the expected results:
library(Rcpp)
sourceCpp("foo.cpp")
vec1 <- c(0, 1, 2, 3, 4, 5, 6, 7, 9)
vec2 <- c(1, 2, 7, 5, 3, 6, 80, 4, 8)
foo(vec1, vec2)
# [1] 0 1 6 4 2 5 9 3 7
with decent performance (comparisons are to the R solutions presented by other answers):
benchmarks <- bench::mark(match = vec1[match(vec2, sort(vec2))],
rank = vec1[rank(vec2)],
order_order = vec1[order(order(vec2))],
rcpp_order_order = foo(vec1, vec2),
iterations = 10000)
benchmarks[ , 1:3]
# # A tibble: 4 x 3
# expression min median
# <bch:expr> <bch:tm> <bch:tm>
# 1 match 28.4µs 31.72µs
# 2 rank 7.99µs 9.84µs
# 3 order_order 26.27µs 30.61µs
# 4 rcpp_order_order 2.51µs 3.23µs
Note that this solution only works if there are no duplicates. (If you might run into duplicates, adding a check is demonstrated in the linked-to answer). Also note that these benchmarks were just done on this data; I don't know for sure how they change at scale.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With