Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr: how to highlight the whole search phrase only?

Tags:

solr

A I need to perform a phrase search. On the search results Im getting the exact phrase matches but looking at the highlighted parts I see that the phrase are tokenized i.e This is what I get when I search for the prase "Day 1" :

<arr name="post">
  <str><em>Day</em> <em>1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>

This is what I want to receive as a result:

    <arr name="post">
  <str><em>Day 1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>

The query I m doing is this: Admin console:

q = day 1 
fq = post:"day 1" OR title:"day 1"
hl = true
hl.fl =title,post

select?q=day+1&fq=post%3A%22day+1%22+OR+title%3A%22day+1%22&wt=xml&indent=true&hl=true&hl.fl=title%2Cpost&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

Theese are my fields:

     <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />

This is the solr schema section for my fied type text_general:

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.GreekStemFilterFactory"/>
    <filter class="solr.GreekLowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

B) I can see in the highlight section more disturbing results i.e highlighting not the whole word as expected but single fragments: .where you get to see all of Athens ... <em>Day</em> 2 - Carmens I dont want to see this result in the highlighted section (Only need to see both words "Day 1"). Any ideas ?

I m reading the Solr highlight section but .. really... there is not even 1 example!!!

like image 963
George Papatheodorou Avatar asked Feb 13 '23 00:02

George Papatheodorou


1 Answers

The parameter that needed to be inserted was hl.q which basically means "I want this phrase to be highlighted" and hl.usePhraseHighlighter=true and hl.useFastVectorHighlighter=true

So by adding to my original query : &hl.q="Day+1"&hl.usePhraseHighlighter=true&hl.useFastVectorHighlighter=true worked.

for B) I changed fq = post:"day 1" OR title:"day 1" to fq = post:"day 1". I know that the latter is less from what I need be neverthless is works.

fastVectorHighliter configuration that was used:

   <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false"  termVectors="true" termPositions="true" termOffsets="true"/>
like image 122
George Papatheodorou Avatar answered Feb 27 '23 02:02

George Papatheodorou