Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr query: stop words, OR and AND weirdness

Tags:

solr

lucene

We are using Solr 3.5 with schema that has the following field declaration:

<fieldType name="fieldN" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" 
            catenateAll="0" splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="256"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
            />
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LengthFilterFactory" min="2" max="256"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
            />
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

When we send a query like this:

field1:"term1"

Solr returns results.

When we run this query we still get results:

field1:"term1" AND (field2:term2 OR field3:term2)

While term2 is a stop word and term1 is a regular word.

But when we send a query like this:

field1:"term1" AND (field2:term2 OR field3:term2 OR field4:term2)

Nothing returns.

We also noticed that when we do something like:

(field1:"term1" AND (field2:term2 OR field3:term2)) OR (field1:"term1" AND field4:term2)

works too, but as the real query should search for one term in about 200 fields, this option is less preferred.

Thanks.

like image 712
Noam Avatar asked Apr 04 '12 14:04

Noam


1 Answers

I am guessing that your 'wierdness' has more to do with your solrconfig rules rather than your query with stopwords. I have experienced similar issues with stopword queries inside subqueries and it ended up being my Minimum Match rules in my Dismax search handler.

Look inside your solrconfig.xml and look for the requestHandler your search is using. You should have a "mm" (Minimum Match) string declared. Try adjusting your rules so they are less or more restrictive , whatever your goal is.

Best of luck!

like image 149
harmstyler Avatar answered Oct 26 '22 23:10

harmstyler