I am total beginner with Solr and have a problem with unwanted characters getting into query results. For example when I search for "foo bar" I got content with "'foo' bar" etc. I just want to have exact matches. As far as I know this can be set up in schema.xml file. My content field type:
<fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<filter class="solr.LowerCaseFilterFactory" />
<tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldtype>
Please let me know if you know the solution. Kind Regards.
For both analyzers, the first line should be the tokenizer. The tokenizer is used to split the text into smaller units (words, most of the time). For your need, the WhitespaceTokenizerFactory is probably the right choice.
If you want absolute exact match, you do not need any filter after the tokenizer. But if you do no want searches to be case sensitive, you need to add a LowerCaseFilterFactory.
Notice that you have two analyzers: one of type 'index' and the other of type 'query'. As the names implied, the first one is used when indexing content while the other is used when you do queries. A rule that is almost always good is to have the same set of tokenizers/filters for both analyzers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With