Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Lucene, what is the difference between ANALYZED and ANALYZED_NO_NORMS?

I could not understand the difference between two ways of indexing: ANALYZED and ANALYZED_NO_NORMS. I read the Lucene Javadoc but did not understand the difference.

Can someone tell me more about NORMS? What are the benefits or limitations that they bring to indexing?

like image 230
vicpro Avatar asked Jul 22 '11 11:07

vicpro


1 Answers

ANALYZED

Index the tokens produced by running the field's value through an Analyzer. This is useful for common text. An analyzer might be something like a Snowball Stemmer Analyzer:

  • http://e-mats.org/2009/05/modifying-a-lucene-snowball-stemmer/

ANALYZED_NO_NORMS

Uses an analyzer, however it doesn't create norms for fields.

  • http://lucene.apache.org/java/2_4_0/scoring.html

Norms are created for quick scoring of documents at query time. These norms are usually all loaded into memory so that when you run a query analyzer over an index it can quickly score the search results.

No norms means that index-time field and document boosting and field length normalization are disabled. The benefit is less memory usage as norms take up one byte of RAM per indexed field for every document in the index, during searching.

like image 136
Justin Shield Avatar answered Oct 22 '22 11:10

Justin Shield