Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene Standard Analyzer vs Snowball

Tags:

Just getting started with Lucene.Net. I indexed 100,000 rows using standard analyzer, ran some test queries, and noticed plural queries don't return results if the original term was singular. I understand snowball analyzer adds stemming support, which sounds nice. However, I'm wondering if there are any drawbacks to gong with snowball over standard? Am I losing anything by going with it? Are there any other analyzers out there to consider?

like image 927
alchemical Avatar asked Oct 06 '10 17:10

alchemical


2 Answers

Yes, by using a stemmer such as Snowball, you are losing information about the original form of your text. Sometimes this will be useful, sometimes not.

For example, Snowball will stem "organization" into "organ", so a search for "organization" will return results with "organ", without any scoring penalty.

Whether or not this is appropriate to you depends on your content, and on the type of queries you are supporting (for example, are the searches very basic, or are users very sophisticated and using your search to accurately filter down the results). You may also want to look into less aggressive stemmers, such as KStem.

like image 179
Avi Avatar answered Jan 08 '23 06:01

Avi


The snowball analyzer will increase your recall, because it is much more aggressive than standard analyzer. So you need to evaluate your search results to see if for your data you need to increase recall or precision.

like image 24
Skarab Avatar answered Jan 08 '23 07:01

Skarab