Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr returns only one collation for Suggester Component

I use solr 3.6 and I would like to use collations from suggester as a autocomplete solution for multi term searches. Unfortunately the Suggester returns only one collation for a multi term search, even if a lot of suggestions for each single term exists. Depending on my test searches and the underlying indexed data I'm sure that more collations must exist.

Is something wrong with my Suggester configuration?

    <!--configuration -->
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
  <str name="name">suggest</str>
  <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
  <str name="field">text</str>  <!-- the indexed field to derive suggestions from -->
  <!--<float name="threshold">0.0005</float> disabled for test-->
  <str name="buildOnCommit">true</str>
</lst>
</searchComponent>

<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
  <str name="spellcheck">true</str>
  <str name="spellcheck.dictionary">suggest</str>
  <str name="spellcheck.onlyMorePopular">true</str>
  <str name="spellcheck.count">200</str>
  <str name="spellcheck.collate">true</str>
  <str name="spellcheck.maxCollations">10</str>
</lst>
<arr name="components">
  <str>suggest</str>
</arr>
</requestHandler> 

Example response for q=bio+ber :

<response>
<lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">4</int>
</lst>
<lst name="spellcheck">
    <lst name="suggestions">
        <lst name="bio">
            <int name="numFound">27</int>
            <int name="startOffset">0</int>
            <int name="endOffset">3</int>
            <arr name="suggestion">
                <str>bio</str>
                <str>bio-estetica</str>
                <str>bio-kosmetik</str>
                                    ...
            </arr>
        </lst>
        <lst name="ber">
            <int name="numFound">81</int>
            <int name="startOffset">4</int>
            <int name="endOffset">7</int>
            <arr name="suggestion">
                <str>beratung</str>
                <str>bern</str>
                ...
            </arr>
        </lst>
        <str name="collation">bio beratung</str>
    </lst>
</lst>
</response>
like image 782
Adrian Avatar asked May 11 '12 07:05

Adrian


1 Answers

I was having the same problem as you, and I managed to solve it. It turns out there are several things you need to know in order to get multiple collations to work properly.

First, you must specify a QueryComponent under the components list of the "suggest" requestHandler in your solrconfig.xml. Otherwise your requestHandler does not know how to query the index, so it can't figure out how many hits each corrected query has, so you'll only get one. If you had added spellcheck.collateExtendedResults=true to your query, you would have seen that the hits were 0, which shows that Solr didn't bother to check the corrected query against the index.

They hint at this with a somewhat opaque error message:

INFO: Could not find an instance of QueryComponent. Disabling collation verification against the index.

The easiest way to add it is to use the default QueryComponent, which is called "query." So in the XML you posted above, you'd change the "components" part to:

<arr name="components">
  <str>suggest</str>
  <str>query</str>
</arr>

Secondly, you need to set spellcheck.maxCollations to be more than 1 (duh), and less intuitively, you need to set spellcheck.maxCollationTries to be some large number (e.g. 1000). If either of these are set at the defaults (both 0), then Solr will only give you one collation. Also, you need to set spellcheck.count to be greater than 1.

Thirdly, you need to modify the query to include the field you want to search against, and the terms must be surrounded by quotes to ensure proper collation. So in the case of your query:

q=bio+ber

This really should be:

q=text:"bio+ber"

Obviously in your case, "text" is the default field, so you don't need it. But in my case, I was using a non-default field, so I had to specify it. Otherwise, Solr would count the hits against the "text" field, and all the results would have 0 hits, so the ranking would be useless.

So in my case, the query looked like this:

q=my_field:"brain+c"
&spellcheck.count=5
&spellcheck.maxCollations=10
&spellcheck.maxCollationTries=1000
&spellcheck.collateExtendedResults=true

And my response looked like this:

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">4</int>
  </lst>
  <lst name="spellcheck">
    <lst name="suggestions">
      <lst name="brain">
        <int name="numFound">1</int>
        <int name="startOffset">15</int>
        <int name="endOffset">20</int>
        <arr name="suggestion">
          <str>brain</str>
        </arr>
      </lst>
      <lst name="c">
        <int name="numFound">4</int>
        <int name="startOffset">21</int>
        <int name="endOffset">23</int>
        <arr name="suggestion">
          <str>cancer</str>
          <str>cambrian</str>
          <str>contusion</str>
          <str>cells</str>
        </arr>
      </lst>
      <lst name="collation">
        <str name="collationQuery">my_field:"brain cancer"</str>
        <int name="hits">2</int>
        <lst name="misspellingsAndCorrections">
          <str name="brain">brain</str>
          <str name="c">cancer</str>
        </lst>
      </lst>
      <lst name="collation">
        <str name="collationQuery">my_field:"brain contusion"</str>
        <int name="hits">1</int>
        <lst name="misspellingsAndCorrections">
          <str name="brain">brain</str>
          <str name="c">contusion</str>
        </lst>
      </lst>
      <lst name="collation">
        <str name="collationQuery">my_field:"brain cells"</str>
        <int name="hits">1</int>
        <lst name="misspellingsAndCorrections">
          <str name="brain">brain</str>
          <str name="c">cells</str>
        </lst>
      </lst>
    </lst>
  </lst>
  <result name="response" numFound="0" start="0"/>
</response>

Success!

like image 116
nlawson Avatar answered Sep 19 '22 15:09

nlawson