Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use Lucene's PriorityQueue when I don't know the max size at create time?

I built a custom collector for Lucene.Net, but I can't figure out how to order (or page) the results. Everytime Collect gets called, I can add the result to an internal PriorityQueue, which I understand is the correct way to do this.

I extended the PriorityQueue, but it requires a size parameter on creation. You have to call Initialize in the constructor and pass in the max size.

However, in a collector, the searcher just calls Collect when it gets a new result, so I don't know how many results I have when I create the PriorityQueue. Based on this, I can't figure out how to make the PriorityQueue work.

I realize I'm probably missing something simple here...

like image 732
Deane Avatar asked Oct 29 '11 01:10

Deane


People also ask

Is PriorityQueue always Sorted?

This priority queue will be sorted according to the same comparator as the given collection, or according to its elements' natural order if the collection is sorted according to its elements' natural order. Parameters: c - the collection whose elements are to be placed into this priority queue.

Does PriorityQueue sort ascending?

Priority Queue elements are ordered by their natural ordering unless we provide a Comparator while creating it. The elements are ordered in ascending order by default, hence the head of the queue is the element whose priority is lowest.


1 Answers

PriorityQueue is not SortedList or SortedDictionary. It is a kind of sorting implementation where it returns the top M results(your PriorityQueue's size) of N elements. You can add with InsertWithOverflow as many items as you want, but it will only hold only the top M elements.

Suppose your search resulted in 1000000 hits. Would you return all of the results to user? A better way would be to return the top 10 elements to the user(using PriorityQueue(10)) and if the user requests for the next 10 result, you can make a new search with PriorityQueue(20) and return the next 10 elements and so on. This is the trick most search engines like google uses.

Everytime Commit gets called, I can add the result to an internal PriorityQueue.

I can not undestand the relationship between Commit and search, Therefore I will append a sample usage of PriorityQueue:

public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document>
{
    public CustomQueue(int maxSize): base()
    {
        Initialize(maxSize);
    }

    public override bool LessThan(Document a, Document b)
    {
        //a.GetField("field1")
        //b.GetField("field2");
        return  //compare a & b
    }
}

public class MyCollector : Lucene.Net.Search.Collector
{
    CustomQueue _queue = null;
    IndexReader _currentReader;

    public MyCollector(int maxSize)
    {
        _queue = new CustomQueue(maxSize);
    }

    public override bool AcceptsDocsOutOfOrder()
    {
        return true;
    }

    public override void Collect(int doc)
    {
        _queue.InsertWithOverflow(_currentReader.Document(doc));
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
        _currentReader = reader;
    }

    public override void SetScorer(Scorer scorer)
    {
    }
}

searcher.Search(query,new MyCollector(10)) //First page.
searcher.Search(query,new MyCollector(20)) //2nd page.
searcher.Search(query,new MyCollector(30)) //3rd page.

EDIT for @nokturnal

public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj>
                                where TComp : IComparable<TComp>
{
    Func<TObj, TComp> _KeySelector;

    public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base()
    {
        _KeySelector = keySelector;
        Initialize(size);
    }

    public override bool LessThan(TObj a, TObj b)
    {
        return _KeySelector(a).CompareTo(_KeySelector(b)) < 0;
    }

    public IEnumerable<TObj> Items
    {
        get
        {
            int size = Size();
            for (int i = 0; i < size; i++)
                yield return Pop();
        }
    }
}

var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue);
foreach (var item in pq.Items)
{
}
like image 175
L.B Avatar answered Oct 07 '22 08:10

L.B