Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr suggester duplicate suggestions

I am trying to use Solr(5)s suggestion. Suggestion works but i am getting recurring suggestions. I tried to use grouping on suggestion, it does not work. How can i prevent recurring suggestions?

Here is necessary parts of my schema.xml:

<field name="Name" type="suggest" indexed="true" stored="true" multiValued="false"/>  
...
<fieldType name="suggest" class="solr.TextField">
  <analyzer type="index">        
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>             
        <filter class="solr.LowerCaseFilterFactory"/>           
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>              
  </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>      
        <filter class="solr.LowerCaseFilterFactory"/>           
      </analyzer>
</fieldType>

My solrconfig.xml:

<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
  <str name="name">mySuggester</str>    
  <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
  <str name="suggestAnalyzerFieldType">suggest</str>      
  <str name="exactMatchFirst">true</str>
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>      
  <str name="field">Name</str>
  <str name="weightField">Price</str>      
  <str name="buildOnCommit">true</str>        
  <str name="buildOnStartup">false</str>
  <str name="preserveSep">false</str>    
</lst>  

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">   
  <str name="suggest">true</str>
  <str name="suggest.count">5</str>
  <str name="suggest.dictionary">mySuggester</str>
  <str name="suggest.collate">true</str>     
</lst>
<arr name="components">
  <str>suggest</str>
  <str>query</str>    
</arr>

Example output for "acer" suggestions with params

/suggest?&suggest.dictionary=mySuggester&suggest.q=acer

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
<lst name="suggest">
<lst name="mySuggester">
<lst name="acer">
<int name="numFound">5</int>
<arr name="suggestions">
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2350</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2099</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2000</long>
<str name="payload"/>
</lst>
</arr>
</lst>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>

You can see suggestion Acer V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3" three times.

Also grouping does not work :

suggest?&suggest.dictionary=mySuggester&suggest.q=acer&group=true&group.field=Name

 <response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">90</int>
</lst>
<lst name="suggest">
<lst name="mySuggester">
<lst name="acer">
<int name="numFound">5</int>
<arr name="suggestions">
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2350</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2099</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2000</long>
<str name="payload"/>
</lst>
</arr>
</lst>
</lst>
</lst>
<lst name="grouped">
<lst name="Name">
<int name="matches">0</int>
<arr name="groups"/>
</lst>
</lst>
</response>
like image 571
kkurt Avatar asked Nov 01 '22 04:11

kkurt


1 Answers

You're using a DocumentDictionaryFactory dictionary implementation. It will store the suggested terms against each document. Hence, if the same suggestion term is present in multiple documents, all those instances will be served.

To prevent this, you can

  1. Write an intercepting API that reads the suggestions from Solr (eg: 30 at a time) and then deduplicates them before returning the data
  2. Use another dictionary like FileDictionaryFactory or HighFrequencyDictionaryFactory
like image 200
Yashveer Rana Avatar answered Nov 15 '22 08:11

Yashveer Rana