Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr - example spell checker not working

I have set up the spellchecker for the example installation configuration that comes with Solr. I have followed their instructions for the spellchecker here: [http://wiki.apache.org/solr/SpellCheckComponent][1]

The problem I have is that after following it exactly I still cannot get it to work?

The response when I build (http://localhost:8983/solr/spell?q=:&spellcheck.build=true&spellcheck.q=delll%20ultrashar&spellcheck=true)

looks as follows:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">14</int>
    </lst>
        <str name="command">build</str>
        <result name="response" numFound="17" start="0">
        ...
        </result>
        <lst name="spellcheck">
        <lst name="suggestions"/>
    </lst>
</response>

And when I query with http://localhost:8983/solr/spell?q=:&spellcheck.q=delll+ultrashar&spellcheck=true&spellcheck.extendedResults=true

I get the following response

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
    </lst>
    <result name="response" numFound="17" start="0">
    ...
    </result>
    <lst name="spellcheck">
        <lst name="suggestions">
        <bool name="correctlySpelled">false</bool>
        </lst>
    </lst>
</response>

What gives? Am i missing something in my schema.xml?

The schema.xml is here: http://www.developermill.com/schema.xml

The solrConfig.xml is here: http://www.developermill.com/solrconfig.xml

The only change to the example files was the addition of the following in the solrconfig.xml:

 <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

  <lst name="spellchecker">
    <!--
        Optional, it is required when more than one spellchecker is configured.
        Select non-default name with spellcheck.dictionary in request handler.
    -->
    <str name="name">default</str>
    <!-- The classname is optional, defaults to IndexBasedSpellChecker -->
    <str name="classname">solr.IndexBasedSpellChecker</str>
    <!--
        Load tokens from the following field for spell checking,
        analyzer for the field's type as defined in schema.xml are used
    -->
    <str name="field">spell</str>
    <!-- Optional, by default use in-memory index (RAMDirectory) -->
    <str name="spellcheckIndexDir">./spellchecker</str>
    <!-- Set the accuracy (float) to be used for the suggestions. Default is 0.5 -->
    <str name="accuracy">0.7</str>
    <!-- Require terms to occur in 1/100th of 1% of documents in order to be included in the dictionary -->
    <float name="thresholdTokenFrequency">.0001</float>
  </lst>
  <!-- Example of using different distance measure -->
  <lst name="spellchecker">
    <str name="name">jarowinkler</str>
    <str name="field">lowerfilt</str>
    <!-- Use a different Distance Measure -->
    <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
    <str name="spellcheckIndexDir">./spellchecker</str>

  </lst>

  <!-- This field type's analyzer is used by the QueryConverter to tokenize the value for "q" parameter -->
  <str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>
<!--
    The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens.  Uses a simple regular expression
    to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser.

Optional, defaults to solr.SpellingQueryConverter
-->
<queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/>

<!--  Add to a RequestHandler
     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     NOTE:  YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT.  THIS IS DONE HERE SOLELY FOR
     THE SIMPLICITY OF THE EXAMPLE.  YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.
     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-->
<requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
  <lst name="defaults">
    <!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
    <str name="spellcheck.dictionary">default</str>
    <!-- omp = Only More Popular -->
    <str name="spellcheck.onlyMorePopular">false</str>
    <!-- exr = Extended Results -->
    <str name="spellcheck.extendedResults">false</str>
    <!--  The number of suggestions to return -->
    <str name="spellcheck.count">1</str>
  </lst>
  <!--  Add to a RequestHandler
       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
       REPEAT NOTE:  YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT.  THIS IS DONE HERE SOLELY FOR
       THE SIMPLICITY OF THE EXAMPLE.  YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.
       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  -->
  <arr name="last-components">
    <str>spellcheck</str>
  </arr>
</requestHandler>
like image 454
Jerome Erasmus Avatar asked Nov 05 '22 06:11

Jerome Erasmus


1 Answers

The textSpell field definition is in the wrong place. The following fragment should be within the types tag inside the schema.xml:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StandardFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"  expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StandardFilterFactory"/>
    </analyzer>
</fieldType>

After you've fixed that, everything should work I guess, but I'd suggest you to work on cleaning up a little bit your example, since it basically contains everything you can configure. You should keep just what you really need.

like image 193
javanna Avatar answered Nov 15 '22 06:11

javanna