Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the unique results from Lucene index?

I am trying to search from lucene index . I want to get the unique results but its returning the duplicate results also. I searched on google and found it can be done with the help of a collector. How can I achieve this?

I am using the following code:

File outputdir= new File("path upto lucene directory");
Directory directory = FSDirectory.open(outputdir);
IndexSearcher= new IndexSearcher(directory,true);

QueryParser queryparser = new QueryParser(Version.LUCENE_36, "keyword", new StandardAnalyzer(Version.LUCENE_36));

Query query = queryparser.parse("central");

topdocs = indexSearcher.search(query, maxhits);
ScoreDoc[] score = topdocs.scoreDocs;
int length = score.length;
like image 732
adesh singh Avatar asked Nov 18 '13 09:11

adesh singh


People also ask

How do you query in Lucene?

Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*" symbol. You can also use the wildcard searches in the middle of a term.

How is Lucene score calculated?

Lucene scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a given Document is to a User's query.

Why is Lucene so fast?

Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.


2 Answers

Are you indexing content before each search ?

If so, I suggest you to separate indexing code and searching code because if you launch this script several times without deleting the index folder Lucene doesn't overwrite the index but add again the content to the index. I think this is why you get duplicates results.

like image 80
Chavjoh Avatar answered Jan 02 '23 13:01

Chavjoh


You should have a field named for example "duplicate" and set the value to "true" on indexing time when it already has a duplicate in the index.

So you can search for

Query query = queryparser.parse("central -duplicate:true");
like image 39
fatih Avatar answered Jan 02 '23 11:01

fatih