What is the fastest way to count all results for a given Query in Lucene?
Since 1. will do scoring on all docs, and 2. could have an upfront hit due to loading of the FieldCache, I assume the answer is 3. It just seems odd that Lucene doesn't provide such a collector out of the box?
The codes should be here now: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java
You are right that #3 will be quicker, but I don't think it's because of scoring. There is a much faster way, skip to the bottom if you don't care about the reasoning behind this.
The performance loss of #1 comes from the fact that the TopDocs collector will keep the docs in a priority queue, which means that you will lose some time sorting them by score. (You will also eat up some memory, but since you're storing just a heap of int+float pairs, it's probably pretty minimal.)
As to why Lucene doesn't provide this out of the box: you generally don't want to find all results. That's why when you search, you say to only find the top n results. There are strong theoretical reasons for this. Even Google says "Showing 25 of about n results."
So my advice to you is the following: if you have a reasonable number of results, then using TopDocs.totalHits
won't be too bad performance-wise. If the totalHits
method gives you problems, I don't think that a custom collector will be much better. (TopDocs.totalHits will run in n log n time, and the custom collector will be linear. Depending on your set up, the log n coefficient may be relevant, or it may not.)
So, if you absolutely need this functionality, and TopDocs.totalHits
is too slow, I would recommend looking at the document frequency of the search terms. You could assume that frequency is independent (so p(A and B)=p(A)*p(B)) and make a pretty good guess from there. It will be very fast, because it's just a constant-time lookup for each term.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With