I configured solr 4.10 (also 5.3) with highlighting functionality. It works fine with most of the words, however I found some words which "does not" allow highlightings, that is, solr returns the required docs, but does not highlights some of them. What can cause such effect? solrconfig.xml <pre class="prettyprint"><code> <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">json</str> <str name="indent">true</str> <str name="defType">edismax</str> <str name="bf">product(concount)</str> <str name="df">text bio text_syn text_syn_other</str> <str name="qf"> text^25 bio^16 text_syn^8 text_syn_other^3 </str> <str name="hl">on</str> <str name="hl.fl">text bio text_syn text_syn_other</str> <str name="hl.preserveMulti">true</str> <str name="hl.encoder">html</str> <str name="f.text.hl.fragsize">100</str> <str name="hl.snippets">20</str> <arr name="components"> <str>highlight</str> </arr> </lst> </code></pre> schema.xml <pre class="prettyprint"><code> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <field name="text" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" /> <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" /> <field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" /> <field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" /> <field name="concount" type="long" indexed="true" stored="true" multiValued="false" /> <field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" /> <copyField source="text" dest="text_syn"/> <copyField source="bio" dest="text_syn"/> <copyField source="text" dest="text_syn_other"/> <copyField source="bio" dest="text_syn_other"/> </code></pre> For the query <code>http://localhost:8983/solr/select?q=senior</code> I got docs containing the word <code>senior</code>, but in highlighting section of solr response that word is not highlighted. <hr> UPDATE 1: I find out that I have the word <code>senior</code> in my <code>synonyms_abbr.txt</code> file, the line <code>senior,lead</code>. When I commented that line or replaced the places of words, <code>lead,senior</code>, surprisingly the word <code>senior</code> started geting highlighting. Any ideas ? <hr> UPDATE 2: Words from <code>synonyms.txt</code> and <code>synonyms_other.txt</code> are getting highlighting normally, but words from <code>synonyms_abbr.txt</code> behave strangely as follows. For example, I have the line <code>lead,head,senior</code> in <code>synonyms_abbr.txt</code> then <ul> <li>the queries <code>http://localhost:8983/solr/select?q=senior</code> and <code>http://localhost:8983/solr/select?q=head</code> does not highlight any word,</li> <li>the query <code>http://localhost:8983/solr/select?q=lead</code> highlights not only the word <code>lead</code>, but also <code>head</code> and <code>senior</code>.</li> </ul>

From your update2 it is clear that only the first word among <code>lead,head,senior</code> is actually used for synonym matching and highlighting. If you look at Docs on SolrWiki https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters there is a mention of <code>expand=true</code> having a certain effect The synonyms parameter names an external file defining the synonyms. If ignoreCase is true, matching will lowercase before checking equality. If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list. The site also presents and example <pre class="prettyprint"><code># If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod </code></pre> This seems to be consistent with the behaviour you are observing. This implies that you should change the Synonym filters definition in schema.xml to use expand=true OR change the way your synonyms file defines the filter to use explicit mapping. Additionally since the Analyzers work at time of indexing, you may have to reindex documents for this to work.

Solr does not highlight some words

Tags:

highlight

solr

I configured solr 4.10 (also 5.3) with highlighting functionality. It works fine with most of the words, however I found some words which "does not" allow highlightings, that is, solr returns the required docs, but does not highlights some of them.

What can cause such effect?

solrconfig.xml

 <requestHandler name="/select" class="solr.SearchHandler">
 <lst name="defaults">
   <str name="wt">json</str>
   <str name="indent">true</str>
   <str name="defType">edismax</str>
   <str name="bf">product(concount)</str>
   <str name="df">text bio text_syn text_syn_other</str>
   <str name="qf">
    text^25 bio^16 text_syn^8 text_syn_other^3
   </str>
   <str name="hl">on</str>
   <str name="hl.fl">text bio text_syn text_syn_other</str>
   <str name="hl.preserveMulti">true</str>
   <str name="hl.encoder">html</str>
   <str name="f.text.hl.fragsize">100</str>
   <str name="hl.snippets">20</str>
   <arr name="components">
     <str>highlight</str>
   </arr>
 </lst>

schema.xml

    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" />
<field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" />

<field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" />

<field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" />

<field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" />

<field name="concount" type="long" indexed="true" stored="true" multiValued="false" />

<field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" />

<copyField source="text" dest="text_syn"/>
<copyField source="bio" dest="text_syn"/>
<copyField source="text" dest="text_syn_other"/>
<copyField source="bio" dest="text_syn_other"/>

For the query http://localhost:8983/solr/select?q=senior I got docs containing the word senior, but in highlighting section of solr response that word is not highlighted.

UPDATE 1: I find out that I have the word senior in my synonyms_abbr.txt file, the line senior,lead. When I commented that line or replaced the places of words, lead,senior, surprisingly the word senior started geting highlighting. Any ideas ?

UPDATE 2: Words from synonyms.txt and synonyms_other.txt are getting highlighting normally, but words from synonyms_abbr.txt behave strangely as follows. For example, I have the line lead,head,senior in synonyms_abbr.txt then

the queries http://localhost:8983/solr/select?q=senior and http://localhost:8983/solr/select?q=head does not highlight any word,
the query http://localhost:8983/solr/select?q=lead highlights not only the word lead, but also head and senior.

597

asked Oct 20 '15 11:10

Mher

2 Answers

From your update2 it is clear that only the first word among lead,head,senior is actually used for synonym matching and highlighting.

If you look at Docs on SolrWiki https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters there is a mention of expand=true having a certain effect

The synonyms parameter names an external file defining the synonyms. If ignoreCase is true, matching will lowercase before checking equality. If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.

The site also presents and example

# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod

This seems to be consistent with the behaviour you are observing. This implies that you should change the Synonym filters definition in schema.xml to use expand=true OR change the way your synonyms file defines the filter to use explicit mapping.

Additionally since the Analyzers work at time of indexing, you may have to reindex documents for this to work.

answered Sep 28 '22 13:09

vvs

Some fields are not stored thus cannot be returned. Since they are indexed they are searchable. Change your schema to have stored="true" for all the fields you want to highlight.

<field name="text_syn" type="text_en_syn" indexed="true" stored="true" multiValued="true" />
<field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="true" multiValued="true" />

By looking at your config I presume highlighting works on the fields bio and text?

answered Sep 28 '22 11:09

ilinca

Related questions
                            
                                Solr in Component Services shows paused and can't start on Windows
                            
                                SOLR slave is doing full Copy as it was not able to Delete unused Index dir
                            
                                How can I do indexing .html files in SOLR
                            
                                Solr MoreLikeThis boosting query fields
                            
                                java.lang.IllegalStateException: Cannot call sendError() after the response has been committed
                            
                                Cassandra + Solr/Hadoop/Spark - Choosing the right tools
                            
                                Solr: Scoring exact matches higher than partial matches
                            
                                SolrCloud vs Standalone Solr
                            
                                How to Register Solr with Eureka
                            
                                Solr Facetting - Showing First 10 results and Other
                            
                                How can I use Verity to index and search database content in ColdFusion 9?
                            
                                apache solr : sum of data resulted from group by
                            
                                Mocking and Unit Testing Solr and Lucene Index
                            
                                Solr for Arabic
                            
                                How to boost longer ngrams in solr?
                            
                                how to search negative number in solr?
                            
                                Finding or configuring Solr home directory
                            
                                to get the count of multi valued field in solr
                            
                                Solr russian spellcheck
                            
                                Solr issue: ClusterState says we are the leader, but locally we don't think so

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With