I've found that searches that contain 'of', 'and', 'the', etc. will not return results because Lucene has removed stop words. So if I search for a item that had a title of "Aftermath of the first world war" I will get zero results.
But if I strip 'of' and 'the', then I am searching for "aftermath first world war". I will get the expected document back.
Does the ContentSearch API remove stop words from queries? Is this something one can configure Lucene to remove? Or should I remove these stop words before building my query?
Thanks Adam
You can configure Sitecore Standard Analyzer to accept your own custom set of Stopwords. Create an text file with the stopwords (one stop word per line) and then Make the below config changes in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net">
<param hint="version">Lucene_30</param>
<param desc="stopWords" type="System.IO.FileInfo, mscorlib">
<param hint="fileName">[FULL_PATH_TO_SITECORE_ROOT_FOLDER]\Data\indexes\stopwords.txt</param>
</param>
</param>
</param>
Further Reading : I have written an blog post about this issue and might be of help http://blog.horizontalintegration.com/2014/03/19/sitecore-standard-analyzer-managing-you-own-stop-words-filter/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With