Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the N smallest [Comparable] items in a set

I have an unsorted Collection of objects [that are comparable], is it possible to get a sub list of the collection of the list without having to call sort?

I was looking at the possibility of doing a SortedList with a limited capacity, but that didn't look like the right option.

I could easily write this, but I was wondering if there was another way.

I am not able to modify the existing collection's structure.

like image 549
monksy Avatar asked Mar 29 '11 04:03

monksy


3 Answers

Since you don't want to call sort(), it seems like you are trying to avoid an O(n log(n)) runtime cost. There is actually a way to do that in O(n) time -- you can use a selection algorithm.

There are methods to do this in the Guava libraries (Google's core Java libraries); look in Ordering and check out:

  • public <E extends T> List<E> Ordering.leastOf(Iterable iterable, int k)
  • public <E extends T> List<E> Ordering.greatestOf(Iterable iterable, int k)

These are implementations of quickselect, and since they're written generically, you could just call them on your Set and get a list of the k smallest things. If you don't want to use the entire Guava libraries, the docs link to the source code, and I think it should be straightforward to port the methods to your project.

If you don't want to deviate too far from the standard libraries, you can always use a sorted set like TreeSet, though this gets you logarithmic insert/remove time instead of the nice O(1) performance of the hash-based Set, and it ends up being O(n log(n)) in the end. Others have mentioned using heaps. This will also get you O(n log(n)) running time, unless you use some of the fancier heap variants. There's a fibonacci heap implementation in GraphMaker if you're looking for one of those.

Which of these makes sense really depends on your project, but I think that covers most of the options.

like image 54
Todd Gamblin Avatar answered Nov 12 '22 06:11

Todd Gamblin


I would probably create a sorted set. Insert the first N items from your unsorted collection into your sorted set. Then for the remainder of your unsorted collection:

  1. insert each item in the sorted set
  2. delete the largest item from the sorted set
  3. Repeat until you've processed all items in the unsorted collection
like image 25
Jerry Coffin Avatar answered Nov 12 '22 05:11

Jerry Coffin


Yes, you can put all of them into a max heap data structure with a fixed size of N, conditionally, if the item is smaller than the largest in the max heap (by checking with the get() "peek" method). Once you have done so they will, by definition, be the N smallest. Optimal implementations will perform with O(M)+lg(N) or O(M) (where M is the size of the set) performance, which is theoretically fastest. Here's some pseudocode:

MaxHeap maxHeap = new MaxHeap(N);
for (Item x : mySetOfItems) {
  if (x < maxHeap.get()) {
    maxHeap.add(x);
  }
}

The Apache Commons Collections class PriorityBuffer seems to be their flagship binary heap data structure, try using that one.

like image 35
maerics Avatar answered Nov 12 '22 06:11

maerics