I am trying to search from lucene index . I want to get the unique results but its returning the duplicate results also. I searched on google and found it can be done with the help of a collector. How can I achieve this?
I am using the following code:
File outputdir= new File("path upto lucene directory");
Directory directory = FSDirectory.open(outputdir);
IndexSearcher= new IndexSearcher(directory,true);
QueryParser queryparser = new QueryParser(Version.LUCENE_36, "keyword", new StandardAnalyzer(Version.LUCENE_36));
Query query = queryparser.parse("central");
topdocs = indexSearcher.search(query, maxhits);
ScoreDoc[] score = topdocs.scoreDocs;
int length = score.length;
Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*" symbol. You can also use the wildcard searches in the middle of a term.
Lucene scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a given Document is to a User's query.
Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.
Are you indexing content before each search ?
If so, I suggest you to separate indexing code and searching code because if you launch this script several times without deleting the index folder Lucene doesn't overwrite the index but add again the content to the index. I think this is why you get duplicates results.
You should have a field named for example "duplicate" and set the value to "true" on indexing time when it already has a duplicate in the index.
So you can search for
Query query = queryparser.parse("central -duplicate:true");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With