Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TreeSet: number of elements less than a value efficiently

I need a way to calculate the number of elements less than X in a TreeSet of Integers really fast.

I can use the

  • subSet()
  • headSet()
  • tailSet()

methods but they are really slow (I just need the count, not the numbers themselves). Is there a way?

Thank you.


EDIT:

I found a workaround that makes things a lot faster! I am using BitSet and it's cardinality() method. I create a BitSet at first and for every element added to the TreeSet I set the corresponding index in BitSet. Now, to count the number of elements less than X I use:

bitset.get(0, X+1).cardinality()

This is much faster compared with treeset.subSet(0, true, X, true).size().

Anyone knows why? I assume BitSet.cardinality() doesn't use linear search.

like image 771
mnmp Avatar asked Dec 23 '15 02:12

mnmp


2 Answers

Since all answers so far point to data structures different than Java's TreeSet, I would suggest the Fenwick tree, which has O(log(N)) for updates and queries; see the link for Java implementation.

like image 111
P Marecki Avatar answered Oct 21 '22 06:10

P Marecki


How fast does 'really fast' need to be? Roughly how many elements do you have?

subSet()/headSet()/tailSet() are O(1) because they return a view of the original treeset, but if you size() your subSet() you are still iterating over all the original elements, hence O(N).

Are you using Java 8? This will be about the same but you can parallelise the cost.

Set<Integer> set = new TreeSet<>();
// .. add things to set

long count = set.parallelstream().filter(e -> e < x).count();

NB EDIT

With further exploration and testing I cannot substantiate the claim "if you size() your subSet() you are still iterating over all the original elements". I was wrong. parallelstream().count() on this 4 core machine was ~30% slower than subSet().size()

like image 24
KarlM Avatar answered Oct 21 '22 04:10

KarlM