I have two vectors, one (A) of about 100 million non-unique elements (integers), the other (B) of 1 million of the same, unique, elements. I am trying to get a list containing the indices of the repeated instances of each element of B in A.
A <- c(2, 1, 1, 1, 2, 1, 1, 3, 3, 2)
B <- 1:3
# would result in this:
[[1]]
[1] 2 3 4 6 7
[[2]]
[1] 1 5 10
[[3]]
[1] 8 9
I first, naively, tried this:
b_indices <- lapply(B, function(b) which(A == b))
which is horribly inefficient, and apparently wouldn't complete in a few years.
The second thing I tried was to create a list of empty vectors, indexed with all elements of B, and to then loop through A, appending the index to the corresponding vector for each element in A. Although technically O(n), I'm not sure about the time to repeatedly append elements. This approach would apparently take ~ 2-3 days, which is still too slow...
Is there anything that could work faster?
This is fast:
A1 <- order(A, method = "radix")
split(A1, A[A1])
#$`1`
#[1] 2 3 4 6 7
#
#$`2`
#[1] 1 5 10
#
#$`3`
#[1] 8 9
B <- seq_len(1e6)
set.seed(42)
A <- sample(B, 1e8, TRUE)
system.time({
A1 <- order(A, method = "radix")
res <- split(A1, A[A1])
})
# user system elapsed
#8.650 1.056 9.704
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With