Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to stop the result in solr, when phrase containing a stopword?

I have a problem while searching with Solr a phrase which has stopwords. Solr send result with stopword and this is not my expected output.

I added a word "test" in stopwords.txt file. In schema.xml file, I have the field like

<field name="searchword" type="text" indexed="true" stored="true"   />

I indexed some data, then tried to search in solr browser window as follows: searchword:"test" and I didn't get result. Then again I gave a phrase like searchword:"test data" and I got the result. How to avoid such scenario? If it contains stop word Solr should not give any result. How to stop the result in solr, when phrase containing a stopword?

The following is the fieldType I'm using:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.CommonGramsFilterFactory" words="stopwords.txt" ignoreCase="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    </analyzer>
    <analyzer type="query">         
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" type="phrase"/>
    </analyzer>
</fieldType>

I need solution for Solr doesn't provided any result while I give phrase that contains the stopword (test)

like image 905
Sriram M Avatar asked Nov 26 '11 10:11

Sriram M


1 Answers

A "stop" word is a word that is not taken into account in a search; it is not a word that "stops" or invalidates results. So the behaviour that you explain is correct: that is what stop words are supposed to do.

There is no way I know of in SOLR to "stop" the results form coming back whenever you use a particular word (maybe someone has an idea?).

The only thing I can think of is: - Don't send the query to SOLR when you observe those terms in the query :) - Remove the terms from the documents before you index them (e.g. using an UpdateRequestProcessor) and use AND queries, that way whenever a term not indexed appears in the query you will get zero results

like image 163
Hugo Zaragoza Avatar answered Nov 07 '22 12:11

Hugo Zaragoza