Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrating from Hit/Hits to TopDocs/TopDocCollector

Tags:

java

lucene

I have existing code that's like:

final Term t = /* ... */;
final Iterator i = searcher.search( new TermQuery( t ) ).iterator();
while ( i.hasNext() ) {
    Hit hit = (Hit)i.next();
    // "FILE" is the field that recorded the original file indexed
    File f = new File( hit.get( "FILE" ) );
    // ...
}

It's not clear to me how to rewrite the code using TopDocs/TopDocCollector and how to iterate over all results.

like image 569
Paul J. Lucas Avatar asked Jun 10 '09 01:06

Paul J. Lucas


Video Answer


1 Answers

Basically, you have to decide on a limit to the number of results you expect. Then you iterate over all the ScoreDocs in the resulting TopDocs.

final MAX_RESULTS = 10000;
final Term t = /* ... */;
final TopDocs topDocs = searcher.search( new TermQuery( t ), MAX_RESULTS );
for ( ScoreDoc scoreDoc : topDocs.scoreDocs ) {
    Document doc = searcher.doc( scoreDoc.doc )
    // "FILE" is the field that recorded the original file indexed
    File f = new File( doc.get( "FILE" ) );
    // ...
}

This is basically what the Hits class does, only it sets the limit at 50 results, and if you iterate past that, then the search is repeated, which is usually wasteful. That is why it was deprecated.

ADDED: If there isn't a limit you can put on the number of the results, you should use a HitCollector:

final Term t = /* ... */;
final ArrayList<Integer> docs = new ArrayList<Integer>();
searcher.search( new TermQuery( t ), new HitCollector() {
    public void collect(int doc, float score) {
        docs.add(doc);
    }
});

for(Integer docid : docs) {
    Document doc = searcher.doc(docid);
    // "FILE" is the field that recorded the original file indexed
    File f = new File( doc.get( "FILE" ) );
    // ...
}
like image 198
itsadok Avatar answered Oct 19 '22 12:10

itsadok