Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr search query case sensitiveness

Tags:

java

apache

solr

I am trying to use a Solr search for some records having FirstName as;

abcd
Abcd
abcD
ABcd
abCd
abCD

Now I am trying to do a search with wildcard character support. I need to understand how exactly does the search work in terms of being case sensitive.

e.g. If I pass the FirstName parameter as ab* Vs Ab*, which records would be returned ?

Is there some way to make/force the search to be case-sensitive OR case-insensitive ?

like image 907
copenndthagen Avatar asked Jul 17 '12 08:07

copenndthagen


People also ask

What is case insensitive search?

By default, searches are case-insensitive. You can make your search case-sensitive by using the case filter. For example, the following search returns only results that match the term HelloWorld . It excludes results where the case doesn't match, such as helloWorld or helloworld . case:yes HelloWorld.

How do I query in Solr collection?

You can search for "solr" by loading the Admin UI Query tab, enter "solr" in the q param (replacing *:* , which matches all documents), and "Execute Query". See the Searching section below for more information. To index your own data, re-run the directory indexing command pointed to your own directory of documents.


4 Answers

It depends on how you define your fields in schema.xml . If you use LowerCaseFilterFactory while indexing and querying , then all queries will be case-insensitive. Otherwise it will be case-sensitive.

<filter class="solr.LowerCaseTokenizerFactory"/>
like image 121
Parvin Gasimzade Avatar answered Oct 20 '22 00:10

Parvin Gasimzade


Default defined Fields in the solr schema works very differently.

data type 'string' stores a word as an exact string not complete.

While 'text_general' typically performs tokenization, and secondary processing (such as case insensitive and whole string match). it is very Useful for all scenarios when we want to match part of a sentence.

If the following sample, "Search into the sentence", is indexed to both fields we must search for exactly the Search into the sentence to get a hit from the string field, while it will return the different result in case of text_general.

Here seller name will be match exactly in the search string, while product name will be search into the whole sentence above.

Example:

<field name="seller_name" type="string" indexed="true" stored="true"/>
<field name="product_name" type="text_general" indexed="true" stored="true"/>
like image 41
Aman Garg Avatar answered Oct 20 '22 01:10

Aman Garg


You configure it within your schema. For example:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="query">
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

means the field is considered lower case for queries this gives impression to be case-insensitive search.

like image 24
Francisco Spaeth Avatar answered Oct 20 '22 01:10

Francisco Spaeth


By default, a value is matched exactly against the stored value. If you want a field to be case-insensitive, the usual way is to have a field type that uses a lowercase filter, making all the indexed content the same case and practically making the search case insensitive (since the query value also will be lowercased).

The example content does this for the 'text' and 'text_en' field types:

<filter class="solr.LowerCaseFilterFactory"/>

There is however a few particular areas where automagic handling of lowercasing for wild card queries may cause troubles, and MultitermQueryAnalysis was introduced in Solr 3.6 and 4.0 to handle those situations. 3.6 and 4.0 should do wild card search automagically the right way if the field is lowercased already.

I'd suggest lowercasing the name in the query (as long as you've applied the LowerCaseFilterFactory when indexing as well) when using wildcards if you're not getting the correct behaviour pre-3.6.

like image 39
MatsLindh Avatar answered Oct 19 '22 23:10

MatsLindh