Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Solr SuggestComponent able to return shingles instead of whole field values?

I use solr 5.0.0 and want to create an autocomplete functionality generating suggestions from the word-grams (or shingles) of my documents. The problem is that in return of a suggest-query I only get complete "terms" of the search field which can be extremly long.

CURRENT PROBLEM:

Input:"so" Suggestions: "......extremly long text son long text continuing......"

"......next long text solar next text continuing......"

GOAL:

Input: "so"

Suggestions with shingles:

"son"

"solar"

"solar test"

etc

<searchComponent name="suggest" class="solr.SuggestComponent" 
               enable="${solr.suggester.enabled:true}"     >
<lst name="suggester">
  <str name="name">mySuggester</str>
  <str name="lookupImpl">AnalyzingInfixLookupFactory</str>      
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>
  <str name="field">title_and_description_suggest</str>
  <str name="weightField">price</str>
  <str name="suggestAnalyzerFieldType">autocomplete</str>
  <str name="queryAnalyzerFieldType">autocomplete</str>
 <str name="buildOnCommit">true</str>
</lst>

schema.xml:

<fieldType name="autocomplete" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true" outputUnigramsIfNoShingles="true"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

I want to return max 3 words as autocomplete term. Is this possible with the SuggestComponent or how would you do it? No matter what I try I always receive the complete field value of matching documents.

Is that expected behaviour or what did I do wrong?

Many thanks in advance

like image 819
Stefan Avatar asked Mar 17 '23 12:03

Stefan


1 Answers

In schema.xml define fieldType as follows:

 <fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="5"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

In schema.xml define your field as follows:

<field name="example_field" type="text_autocomplete" indexed="true" stored="true"/>

Write your query as follows:

query?q=*&
rows=0&
facet=true&
facet.field=example_field&
facet.limit=-1&
wt=json&
indent=true&
facet.prefix=so

In the facet.prefix field, specify the term being searched for which you want suggestions ('so', in this example). If you need less than 5 words in the suggestion, reduce maxShingleSize in the fieldType definition accordingly. By default, you will get the results in decreasing order of their frequency of occurrence.

like image 101
Utsav Chatterjee Avatar answered Apr 26 '23 04:04

Utsav Chatterjee