I am new to Solr. By reading Solr's wiki, I don't understand the differences between WhitespaceTokenizerFactory and StandardTokenizerFactory. What's their real difference?
They differ in how they split the analyzed text into tokens.
The StandardTokenizer does this based on the following (taken from lucene javadoc):
The WhitespaceTokenizer does this based on whitespace characters:
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.
You should pick the tokenizer that best fits your application. In any case you have to use the same analyzer/tokenizers for indexing and searching!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With