Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr fuzzy match has better score than exact match

Tags:

solr

lucene

I'm doing a fuzzy search in Solr, and in rare cases the exact match has score lower than a fuzzy match. I even found a reason using debugQuery: fuzzy match has matched 3 different words, and exact match matched only one. So the "sum of" 3 matches got better value, than one. Here is part of the "explain".

Is there any way to configure Solr for ranking exact matches higher than fuzzy, even in this case?

P.S. I already using omitTermFreqAndPositions="true" omitNorms="true", but it doesn't help if we have a fuzzy match against different words.

like image 488
gray Avatar asked Dec 11 '13 17:12

gray


People also ask

Does Solr support fuzzy search?

Solr supports a variety of term modifiers that add flexibility or precision, as needed, to searches. These modifiers include wildcard characters, characters for making a search "fuzzy" or more general, and so on.

How do you use exact match in Solr?

Phrase match: A simple way by which we can achieve exact matching in Solr is by using the default string type. It is exact phrase matching. the string is a useful type for facet where we search the index by using the text pulled from the index itself.

What is meant by fuzzy matching?

Fuzzy matching (FM), also known as fuzzy logic, approximate string matching, fuzzy name matching, or fuzzy string matching is an artificial intelligence and machine learning technology that identifies similar, but not identical elements in data table sets.

What is fuzzy logic in search?

A fuzzy search searches for text that matches a term closely instead of exactly. Fuzzy searches help you find relevant results even when the search terms are misspelled. To perform a fuzzy search, append a tilde (~) at the end of the search term.


2 Answers

You need to do a boolean query of exact match with higher boost with a Boolean OR query of fuzzy query so that exact matches rank higher. Do not worry about double work for solr. It is built for very complex Lucene query trees. Using a combination of queries to get relevancy ranking expected is common practice. If you agree pl. accept my answer.

like image 164
Arun Avatar answered Oct 16 '22 12:10

Arun


I had a similar problem and solved this by using copyField and doing exact and fuzzy (phonetic in my case) matching on separate fields.

Then used EdisMax's qf field to give higher weight for matches on exact field vs ones on fuzzy matching.

like image 21
Tagar Avatar answered Oct 16 '22 12:10

Tagar