Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the Sitecore 7 ContentSearch API remove stop words from queries?

I've found that searches that contain 'of', 'and', 'the', etc. will not return results because Lucene has removed stop words. So if I search for a item that had a title of "Aftermath of the first world war" I will get zero results.

But if I strip 'of' and 'the', then I am searching for "aftermath first world war". I will get the expected document back.

Does the ContentSearch API remove stop words from queries? Is this something one can configure Lucene to remove? Or should I remove these stop words before building my query?

Thanks Adam

like image 458
Adamsimsy Avatar asked Feb 05 '14 17:02

Adamsimsy


1 Answers

You can configure Sitecore Standard Analyzer to accept your own custom set of Stopwords. Create an text file with the stopwords (one stop word per line) and then Make the below config changes in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file

<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
  <param desc="defaultAnalyzer" type="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net">
    <param hint="version">Lucene_30</param>
      <param desc="stopWords" type="System.IO.FileInfo, mscorlib">
      <param hint="fileName">[FULL_PATH_TO_SITECORE_ROOT_FOLDER]\Data\indexes\stopwords.txt</param>
      </param>
  </param>
</param>   

Further Reading : I have written an blog post about this issue and might be of help http://blog.horizontalintegration.com/2014/03/19/sitecore-standard-analyzer-managing-you-own-stop-words-filter/

like image 104
Sheetal Jain Avatar answered Sep 21 '22 17:09

Sheetal Jain