Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

stop words in sitecore

We are using Lucene for text search as part of sitecore. Is there any method to ignore stop words (like a,an,the...) in the sitecore search?

like image 523
rahul Avatar asked Feb 02 '11 07:02

rahul


2 Answers

By default, Sitecore uses Lucene standard analyzer - Lucene.Net.Analysis.Standard.StandardAnalyzer. You can see this is defined in /configuration/sitecore/search/analyzer element of web.config file. One of the constructors of StandardAnalyzer class accepts the array of strings it will consider stop words. By default it uses the hardcoded list of stop words which include:

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

If you'd like to override this behavior, I think you should inherit StandardAnalyzer and override its default constructor to take the stop words from another source instead of the hardcoded array. You have various options, even reading it from a text file. Don't forget to replace the standard class with yours in web.config.

See other constructors of StandardAnalyzer class for more details. .NET Reflector is your friend here.

like image 169
Yan Sklyarenko Avatar answered Oct 02 '22 18:10

Yan Sklyarenko


An example for Yans post:

public class CaseAnalyzer : Lucene.Net.Analysis.Standard.StandardAnalyzer
{
   private static Hashtable stopWords = new Hashtable(); //{{"by","by"}}; <-- Makes "by" a stopword that will not be matched in analyzer
   public CaseAnalyzer() : base(Lucene.Net.Util.Version.LUCENE_29, stopWords)
   {      
   }
}

this should be registered in the web.config under

/configuration/sitecore/search/analyzer

an example of the analyzer registration

<caseanalyzer type="EBF.Business.Search.Analyzers.CaseAnalyzer, EBF.Business, Version=1.0.0.0, Culture=neutral"/>

Lastly you just need to register your analyzer in the search configuration like this

<Analyzer ref="search/caseanalyzer" />
like image 44
2 revs Avatar answered Oct 02 '22 20:10

2 revs