I'm trying to remove 140,000 objects from an ArrayList of size 7,140,000. I expected this would take seconds (if that), but instead Java is taking multiple seconds per thousand objects. Here is my code:
for (int i = list.size(); i > P; i--)
{
int size = list.size();
int index = (int) (Math.random() * size);
list.remove(index);
}
Note: P is a constant that I previously set to 7,000,000.
The goal of the loop is to randomly remove objects from list until its size is 7,000,000.
Is Java taking such a long time because I'm starting off with over 7 million objects? I've never noticed this efficiency problem with removing from ArrayLists in the past. If it helps, I use the DrJava Beta IDE.
Every time you remove an element from an ArrayList, it has to shuffle all of the elements with greater indexes down by one slot. Say you remove the first element of a 7M-element list - you've then got to move 6,999,999 elements too.
If you're doing this in a loop, it will take O(n^2)
time, where n
is the size of the list. For a 7M-element list, that's going to be pretty slow.
Instead, if you know which elements you want to remove in advance, you can shift all the elements down in a single pass:
int dst = 0;
for (int src = 0; src < list.size(); ++src) {
if (!toRemove(src)) {
list.set(dst++, list.get(src));
}
}
list.subList(dst, list.size()).clear();
where toRemove(src)
is some function which says whether you want to remove the src
-th element.
For example, you might construct a BitSet
with all but P
elements set:
BitSet toRemove = new BitSet(list.size());
for (int i = list.size(); i > P; i--) {
int rand;
do {
rand = Math.random() * list.size();
} while (toRemove.get(rand));
toRemove.set(rand, true);
}
You still have to shift all of the 6,999,999 elements to the right if you just remove the zero-element from a 7M element list; but any other removals don't require any more shifts on top. This algorithm is O(n)
, where n is the size of the list.
Edit: you can pick P
elements from the list (where P <= list.size()
) like this:
int dst = 0;
Random rand = new Random();
for (int src = 0; dst < P; ++src) {
if (rand.nextInt(list.size() - src) < (P-dst)) {
list.set(dst++, list.get(src));
}
}
list.subList(dst, list.size()).clear();
This strategy will select elements from the list with equal probability (*), and works well for any value of P
; it also preserves the original order.
If you want to sample K
items from a list with N
items without drawing the same element twice, there are choose(N, K) = N! / (K! * (N-K)!)
ways to do this. If you want to pick all elements from the list with equal probability, then you should pick any of these c(n,k)
different configurations.
When there are k
items left to pick from n
items, you will either:
k-1
items from the remaining n-1
items; ork
items from the remaining n-1
items.In order to ensure the equal probability of picking the K
elements overall, you need to choose one of the two options according to the number of combinations for picking from the n-1
elements:
#(combinations after taking first item)
P(take first item) = ------------------------------------------------------------------
#(combinations after taking) + #(combinations after not taking)
= C(n-1,k-1) / (C(n-1, k-1) + C(n-1, k))
= ... working omitted ...
= k / n
So, when you've got k
items left to take from n
, you should take the first item k/n
of the time.
The two interesting cases to point out are:
k == n
, k/n = 1
, so you always take the element. Intuitively, if you've got to pick n
items out of n
, you've got to take them all.k == 0
, k/n = 0
, so you never take the element. Intuitively, if you've already picked all K
of your items, you don't need to take any more.To implement this, you can simply generate a uniformly-distributed random number r
in the range [0..n)
, and "take" the element from the list if r < k
.
In terms of the implementation above, k = P - dst
, and n = list.size() - src
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With