I am looking to perform a query for the purposes of maintaining internal integrity; for example, removing all traces of a particular field/value from the index. Therefore it's important that I find all matching documents (not just the top n docs), but the order they are returned in is irrelevant.
According to the docs, it looks like I need to use the Searcher.Search( Query, Collector )
method, but there's no built in Collector class that does what I need.
Should I derive my own Collector for this purpose? What do I need to keep in mind when doing that?
Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.
Overview. Although Lucene provides the ability to create your own queries through its API, it also provides a rich query language through the Query Parser, a lexer which interprets a string into a Lucene Query using JavaCC. Generally, the query parser syntax may change from release to release.
Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene. Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.
Both are better suited for developing a search engine and both are based on Lucene.
Turns out this was a lot easier than I expected. I just used the example implementation at http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Collector.html and recorded the doc numbers passed to the Collect()
method in a List, exposing this as a public Docs
property.
I then simply iterate this property, passing the number back to the Searcher
to get the proper Document
:
var searcher = new IndexSearcher( reader );
var collector = new IntegralCollector(); // my custom Collector
searcher.Search( query, collector );
var result = new Document[ collector.Docs.Count ];
for ( int i = 0; i < collector.Docs.Count; i++ )
result[ i ] = searcher.Doc( collector.Docs[ i ] );
searcher.Close(); // this is probably not needed
reader.Close();
So far it seems to be working fine in preliminary tests.
Update: Here's the code for IntegralCollector
:
internal class IntegralCollector: Lucene.Net.Search.Collector {
private int _docBase;
private List<int> _docs = new List<int>();
public List<int> Docs {
get { return _docs; }
}
public override bool AcceptsDocsOutOfOrder() {
return true;
}
public override void Collect( int doc ) {
_docs.Add( _docBase + doc );
}
public override void SetNextReader( Lucene.Net.Index.IndexReader reader, int docBase ) {
_docBase = docBase;
}
public override void SetScorer( Lucene.Net.Search.Scorer scorer ) {
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With