Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SOLR Case Insensitive Search

I've a problem in SOLR Search.
I have a data like this:
enter image description here

I use solr admin to find this data using query like this:

address_s:*Nadi*

and found those data. But when I use this query:

address_s:*nadi*

it doesn't found anything.
I've googling and I found an answer to create a field with the following script:

<fieldType name="c_text" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>

    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I've copy paste those script into schema.xml, but it still doesn't work. What should I do? Can anyone help me?

like image 549
Praditha Avatar asked Nov 23 '11 10:11

Praditha


People also ask

What is tokenization in SOLR?

The job of a tokenizer is to break up a stream of text into tokens, where each token is (usually) a sub-sequence of the characters in the text. An analyzer is aware of the field it is configured for, but a tokenizer is not.


4 Answers

The address_s field should be defined as -

<field name="address_s" type="c_text" indexed="true" stored="true"/>

If you are using the default schema.xml, this defination should come before -

<dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>

which defines it as a string field type with no analysis performed.

Wildcard queries does not undergo analysis.
So if you apply lower case filter at index time query address_s:*nadi* would work.
However, query address_s:*Nadi* would not, as Nadi will not match nadi in index and you would need to lower case the queries at client side.

like image 121
Jayendra Avatar answered Oct 17 '22 00:10

Jayendra


I've used this as field type:

<fieldType name="string" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

And defined my fields using:

<field name="address" type="string" indexed="true" stored="true"/>

The result: My document returns the fields in the right case (like inserted) and I can search case-insensitive (using both upper- and lowercase letters)...

Version: Solr 3.6

like image 42
Jeff Maes Avatar answered Oct 17 '22 00:10

Jeff Maes


Does your address_s field use this c_text field type in your schema.xml?

If your index has been created with the previous configuration, you need to re-index everything to take the changes into account.

like image 5
jpountz Avatar answered Oct 16 '22 23:10

jpountz


I have used something like this ... In schema.xml i 've put a new fieldType

<fieldType name="newType" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" />
      </analyzer>
 </fieldType>

Assign the new type to the field that you want to make it case & whitespace insensitive Then you have to construct the solr query in the form : fieldName:(*fieldValue\ *)

like image 1
Manos Avatar answered Oct 17 '22 00:10

Manos