Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr query/field analyzer

Tags:

solr

I am total beginner with Solr and have a problem with unwanted characters getting into query results. For example when I search for "foo bar" I got content with "'foo' bar" etc. I just want to have exact matches. As far as I know this can be set up in schema.xml file. My content field type:

<fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <filter class="solr.LowerCaseFilterFactory" />
        <tokenizer class="solr.KeywordTokenizerFactory" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldtype>

Please let me know if you know the solution. Kind Regards.

like image 628
Daniel Avatar asked Aug 23 '10 08:08

Daniel


1 Answers

For both analyzers, the first line should be the tokenizer. The tokenizer is used to split the text into smaller units (words, most of the time). For your need, the WhitespaceTokenizerFactory is probably the right choice.

If you want absolute exact match, you do not need any filter after the tokenizer. But if you do no want searches to be case sensitive, you need to add a LowerCaseFilterFactory.

Notice that you have two analyzers: one of type 'index' and the other of type 'query'. As the names implied, the first one is used when indexing content while the other is used when you do queries. A rule that is almost always good is to have the same set of tokenizers/filters for both analyzers.

like image 167
Pascal Dimassimo Avatar answered Nov 10 '22 19:11

Pascal Dimassimo