Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixed-size collection that keeps top (N) values in Java

Tags:

java

I need to keep top N(< 1000) integers while trying to add values from a big list of integers(around a million sized lazy list). I want to be try adding values to a collection but that needs to keep only the top N(highest values) integers. Is there any preferred data structure to use for this purpose ?

like image 389
Rajat Gupta Avatar asked Aug 03 '14 19:08

Rajat Gupta


People also ask

Do queues have a fixed size?

It is a queue, and the size of the queue is fixed, it means that the queue cannot hold more than specified limit of number of data. In the previous picture the size of the queue is 5 because it is holding 5 integers.

How do you restrict the size of a queue in Java?

ArrayBlockingQueue is bounded, blocking queue that stores the elements internally backed by an array. ArrayBlockingQueue class is a member of the Java Collections Framework. Bounded means it will have a fixed size, you can not store number the elements more than the capacity of the queue.


3 Answers

I'd suggest to use some sorted data structure, such as TreeSet. Before insertion, check the number of items in the set, and if it reached 1000, remove the smallest number if it's smaller than the newly added number, and add the new number.

TreeSet<Integer> set = ...;

public void add (int n) {
    if (set.size () < 1000) {
       set.add (n);
    } else {
       Integer first = set.first();
       if (first.intValue() < n) {
          set.pollFirst();
          set.add (n);
       }
    }
}
like image 66
Eran Avatar answered Oct 17 '22 12:10

Eran


One efficient solution is a slightly tweaked array-based priority queue using a binary min-heap.

First N integers are simply added to the heap one by one or you can build it from array of first N integers (slightly faster).

After that, compare the incoming integer with the root element (which is MIN value found so far). If the new integer is larger that that, simply replace the root with this new integer and perform down-heap operation (i.e. trickle down the new integer until both its children are smaller or it becomes a leaf). The data structure guarantees you will always have N largest integers so far with average addition time of O(log N).

Here is my C# implementation, the mentioned method is named "EnqueueDown". The "EnqueueUp" is a standard enqueue operation that expands the array, adds new leaf and trickles it up.

I have tested it on 1M numbers with max heap size of 1000 and it runs under 200 ms:

namespace ImagingShop.Research.FastPriorityQueue
{
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Linq;
    using System.Runtime.CompilerServices;

    public sealed class FastPriorityQueue<T> : IEnumerable<Tuple<T, float>>
    {
        private readonly int capacity;
        private readonly Tuple<T, float>[] nodes;

        private int count = 0;

        public FastPriorityQueue(int capacity)
        {
            this.capacity = capacity;
            this.nodes = new Tuple<T, float>[capacity];
        }

        public int Capacity => this.capacity;

        public int Count => this.count;

        public T FirstNode => this.nodes[0].Item1;

        public float FirstPriority => this.nodes[0].Item2;

        public void Clear()
        {
            this.count = 0;
        }

        public bool Contains(T node) => this.nodes.Any(tuple => Equals(tuple.Item1, node));

        public T Dequeue()
        {
            T nodeHead = this.nodes[0].Item1;

            int index = (this.count - 1);

            this.nodes[0] = this.nodes[index];
            this.count--;

            DownHeap(index);

            return nodeHead;
        }

        public void EnqueueDown(T node, float priority)
        {
            if (this.count == this.capacity)
            {
                if (priority < this.nodes[0].Item2)
                {
                    return;
                }

                this.nodes[0] = Tuple.Create(node, priority);

                DownHeap(0);

                return;
            }

            int index = this.count;

            this.count++;

            this.nodes[index] = Tuple.Create(node, priority);

            UpHeap(index);
        }

        public void EnqueueUp(T node, float priority)
        {
            int index = this.count;

            this.count++;

            this.nodes[index] = Tuple.Create(node, priority);

            UpHeap(index);
        }

        public IEnumerator<Tuple<T, float>> GetEnumerator()
        {
            for (int i = 0; i < this.count; i++) yield return this.nodes[i];
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void DownHeap(int index)
        {
            while (true)
            {
                int indexLeft = (index << 1);
                int indexRight = (indexLeft | 1);

                int indexMin = ((indexLeft < this.count) && (this.nodes[indexLeft].Item2 < this.nodes[index].Item2))
                    ? indexLeft
                    : index;

                if ((indexRight < this.count) && (this.nodes[indexRight].Item2 < this.nodes[indexMin].Item2))
                {
                    indexMin = indexRight;
                }

                if (indexMin == index)
                {
                    break;
                }

                Flip(index, indexMin);

                index = indexMin;
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void Flip(int indexA, int indexB)
        {
            var temp = this.nodes[indexA];
            this.nodes[indexA] = this.nodes[indexB];
            this.nodes[indexB] = temp;
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void UpHeap(int index)
        {
            while (true)
            {
                if (index == 0)
                {
                    break;
                }

                int indexParent = (index >> 1);

                if (this.nodes[indexParent].Item2 <= this.nodes[index].Item2)
                {
                    break;
                }

                Flip(index, indexParent);

                index = indexParent;
            }
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}

The basic implementation is taken from "Cormen, Thomas H. Introduction to algorithms. MIT press, 2009."

like image 30
Libor Avatar answered Oct 17 '22 14:10

Libor


Google Guava MinMaxPriorityQueue class.

You can also use custom sorting by using a comparator (Use orderedBy(Comparator<B> comparator) method).

Note: This collection is NOT a sorted collection.

See javadoc

Example:

@Test
public void test() {
    final int maxSize = 5;

    // Natural order
    final MinMaxPriorityQueue<Integer> queue = MinMaxPriorityQueue
            .maximumSize(maxSize).create();
    queue.addAll(Arrays.asList(10, 30, 60, 70, 20, 80, 90, 50, 100, 40));

    assertEquals(maxSize, queue.size());
    assertEquals(new Integer(50), Collections.max(queue));

    System.out.println(queue);
}

Output:

[10, 50, 40, 30, 20]

like image 8
Pritesh Mhatre Avatar answered Oct 17 '22 14:10

Pritesh Mhatre