Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr stopwords and empty query

I have a Solr instance with a number of document and an indexed field.

I now want to apply a stopwords list on the query to increase the number of results, by completely ignoring at query time the words included in the stopwords list.

Thus in my configuration I'm using solr.StopFilterFactory in query analyzer.

What I'm expecting is that if I perform a search with only a single word that is in the stopwords list, the result set is the same of a wildcard query, text_title:*, that is the full documents set.

But instead I get 0 results. Am I missing something about the behaviour of the stopwords filter?

like image 233
Lorenzo Marcon Avatar asked Nov 20 '22 05:11

Lorenzo Marcon


1 Answers

solr.StopFilterFactory

This filter discards, or stops analysis of, tokens that are on the given stop words list. A standard stop words list is included in the Solr config directory, named stopwords.txt, which is appropriate for typical English language text.

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter

This filter actually remove token that are in your query, not replace with *
Example :

In: "To be or what?"
Tokenizer to Filter: "To"(1), "be"(2), "or"(3), "what"(4)
Out: "To"(1), "what"(4)

Try to use this filter.
solr.SuggestStopFilterFactory

Like Stop Filter, this filter discards, or stops analysis of, tokens that are on the given stop words list. Suggest Stop Filter differs from Stop Filter in that it will not remove the last token unless it is followed by a token separator.

You would normally use the ordinary StopFilterFactory in your index analyzer and then SuggestStopFilter in your query analyzer.

This filter will remove stop word from your query if it will not followed by token separator.

How to use:

<analyzer type="query">
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.SuggestStopFilterFactory" ignoreCase="true" words="stopwords.txt" format="wordset"/>
</analyzer>

Example :

In: "The The"
Tokenizer to Filter: "the"(1), "the"(2)
Out: "the"(2)
like image 65
Ashraful Islam Avatar answered Dec 26 '22 14:12

Ashraful Islam