How to configure stemming in Solr?

Question

I add to solr index: "American". When I search by "America" there is no results.

How should schema.xml be configured to get results?

current configuration:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/>
            </analyzer>
        </fieldType>

Marko Bonaci · Accepted Answer

Why would you have two stemmers?
Try removing EnglishPorterFilterFactory (deprecated) from both of your analyzer types, rebuild the index and then try whether search for American will yield America.

If that wont work, the other thing you can try is to remove both of your stemmer filters and add SnowballPorterFilterFactory with language="English" instead.

Kaidul · Answer

You have to use one stemmer for an analyzer and EnglishPorterFilterFactory is deprecated as @Marko already mentioned. So you should remove this one from analyzers.

I used SnowballPorterFilterFactory for both index and query analyzers -

<fieldType name="text_stem">
    <analyzer> 
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"/>
        <!-- other filters -->
    </analyzer>
</fieldType>

The fieldType definition is pretty self explanatory, but just in case:

Tokenizer solr.WhitespaceTokenizerFactory: This operation will break up the sentences into words, using whitespaces as delimiters.
Filter solr.SnowballPorterFilterFactory: This filter will apply a stemming algorithm to each word (token). In the example above I have chosen the Snowball Porter stemming algorithm. Solr provides a few implementation of popular stemming algorithms.

You can browse several other stemming algorithms e.g. HunspellStemFilterFactory, KStemFilterFactory too.

How to configure stemming in Solr?

Tags:

solr

stemming

user657009

2 Answers

Marko Bonaci

Kaidul

Recent Activity

Donate For Us

How to configure stemming in Solr?

Tags:

solr

stemming

user657009

2 Answers

Marko Bonaci

Kaidul

Related questions

Recent Activity

Donate For Us