Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore accent search in Solr

Tags:

solr

I am using solr as a search engine. I have a case where a text field contains accent text like "María". When user search with "María", it is giving resut. But when user search with "Maria" it is not giving any result.

My schema definition looks like below:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>

       </analyzer>
</fieldtype>

Please help to solve this issue.

like image 737
pavan kumar Avatar asked Apr 19 '14 12:04

pavan kumar


1 Answers

If you're on solr > 3.x you can try using solr.ASCIIFoldingFilterFactory which will change all the accented characters to their unaccented versions from the basic ascii 127-character set.

Remember to put it after any stemming filter you have configured (you're not using one, so you should be ok).

So your config could look like:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>

       </analyzer>
</fieldtype>
like image 198
soulcheck Avatar answered Oct 11 '22 15:10

soulcheck