Get indices of repeated instances of elements of a vector in other vector (both very large)

Question

I have two vectors, one (A) of about 100 million non-unique elements (integers), the other (B) of 1 million of the same, unique, elements. I am trying to get a list containing the indices of the repeated instances of each element of B in A.

A <- c(2, 1, 1, 1, 2, 1, 1, 3, 3, 2)
B <- 1:3

# would result in this:
[[1]]
[1] 2 3 4 6 7

[[2]]
[1]  1  5 10

[[3]]
[1] 8 9

I first, naively, tried this:

b_indices <- lapply(B, function(b) which(A == b))

which is horribly inefficient, and apparently wouldn't complete in a few years.

The second thing I tried was to create a list of empty vectors, indexed with all elements of B, and to then loop through A, appending the index to the corresponding vector for each element in A. Although technically O(n), I'm not sure about the time to repeatedly append elements. This approach would apparently take ~ 2-3 days, which is still too slow...

Is there anything that could work faster?

Roland · Accepted Answer

This is fast:

A1 <- order(A, method = "radix")

split(A1, A[A1])
#$`1`
#[1] 2 3 4 6 7
#
#$`2`
#[1]  1  5 10
#
#$`3`
#[1] 8 9

B <- seq_len(1e6)
set.seed(42)
A <- sample(B, 1e8, TRUE)

system.time({
  A1 <- order(A, method = "radix")

  res <- split(A1, A[A1])
})
# user      system     elapsed 
#8.650       1.056       9.704

Get indices of repeated instances of elements of a vector in other vector (both very large)

Tags:

indexing

r

Gerome Pistre

1 Answers

Roland

Recent Activity

Donate For Us

Get indices of repeated instances of elements of a vector in other vector (both very large)

Tags:

indexing

r

Gerome Pistre

1 Answers

Roland

Related questions

Recent Activity

Donate For Us