I think I'm still not understanding the lucene indexing options.
The following options are
Store.Yes
Store.No
and
Index.Tokenized
Index.Un_Tokenized
Index.No
Index.No_Norms
I don't really understand the store option. Why would you ever want to NOT store your field?
Tokenizing is splitting up the content and removing the noise words/separators (like "and", "or" etc)
I don't have a clue what norms could be.
How are tokenized values stored?
What happens if i store a value "my string" in "fieldName"?
Why doesn't a query
fieldName:my string
return anything?
In Lucene, a Document is the unit of search and index. An index consists of one or more Documents. Indexing involves adding Documents to an IndexWriter, and searching involves retrieving Documents from an index via an IndexSearcher.
Overview. When using the default Sitefinity CMS search service (Lucene), the search index definition (configurations which content to be indexed) is stored in your website database, and the actual search index files – on the file system. By default, the search index files are in the ~/App_Data/Sitefinity/Search/ folder ...
The Lucene Full Text IndexA full text indexer based on Apache Lucene is available in AEM 6. If a full-text index is configured, then all queries that have a full-text condition use the full-text index, no matter if there are other conditions that are indexed, and no matter if there is a path restriction.
Lucene's index is composed of segments, each of which contains a subset of all the documents in the index, and is a complete searchable index in itself, over that subset. As documents are written to the index, new segments are created and flushed to directory storage.
Means that the value of the field will be stored in the index
Means that the value of the field will NOT be stored in the index
Store.Yes/No does not affect the indexing or searching with lucene. It just tells lucene if you want it to act as a datastore for the values in the field. If you use Store.Yes, then when you search, the value of that field will be included in your search result Documents.
If you're storing your data in a database and only using the Lucene index for searching, then you can get away with Store.No on all of your fields. However, if you're using the index as storage as well, then you'll want Store.Yes.
Means that the field will be tokenized when it's indexed (you got that one). This is useful for long fields with multiple words.
Means that the field will not be analyzed and will be stored as a single value. This is useful for keyword/single-word and some short multi-word fields.
Exactly what it says. The field will not be indexed and therefore unsearchable. However, you can use Index.No along with Store.Yes to store a value that you don't want to be searchable.
Same as Index.Un_Tokenized except for that a few bytes will be saved by not storing some Normalization data. This data is what is used for boosting and field-length normalization.
For further reading, the lucene javadocs are priceless (current API version 4.4.0):
For your last question, about why your query's not returning anything, without knowing anymore about how you're indexing that field, I'd say that it's because your fieldName qualifier is only attached to the 'my' string. To do the search for the phrase "my string" you want:
fieldName:"my string"
A search for both the words "my" and "string" in the fieldName field:
fieldName:(my string)
In case any Java users stumble upon this, the same options in the March 2009 answer still exist in the Lucene 4.6.0 Java library but are deprecated. The current way to set these options is via FieldType.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With