Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr edismax wildcard search does not find original string

Tags:

solr

edismax

I have next content in my Solr index: west indian cherry in filed of type text_en (see below for field definition).

When I search with cherr* match is found.
Also search for cherri* matches word in document.
But search for cherry* does not match.

I suspect PorterStemFilterFactory for this, but I don't understand why (query analyzer is same as index analyzer).


sample query

/solr/select?defType=edismax&q=cherry*

solrconfig.xml

...
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>
...

field analysis

index

org.apache.solr.analysis.StandardTokenizerFactory: cherry
org.apache.solr.analysis.LowerCaseFilterFactory: cherry
org.apache.solr.analysis.EnglishPossessiveFilterFactory: cherry
org.apache.solr.analysis.PorterStemFilterFactory: cherri <-- note the change from cherry to cherri

query

org.apache.solr.analysis.StandardTokenizerFactory: cherry
org.apache.solr.analysis.LowerCaseFilterFactory: cherry
org.apache.solr.analysis.EnglishPossessiveFilterFactory: cherry
org.apache.solr.analysis.PorterStemFilterFactory: cherri
like image 618
Matej Avatar asked Feb 22 '23 08:02

Matej


1 Answers

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers mentions -

On wildcard and fuzzy searches, no text analysis is performed on the search word.

So the search query will not undergo any analysis during query time. Hence the terms indexed would be different from the ones being search upon.

As the indexed term is cherri, the search for cherry* would not match any documents.

like image 52
Jayendra Avatar answered Feb 23 '23 23:02

Jayendra