I've read the Solr highlighting wiki document several times, searched everywhere, but cannot get even basic highlighting to work with my Solr installation. I am running Solr 3.5 on the demo Jetty 6.1 server.
I have indexed 250K documents, and am able to search them just fine. Other than configuring my document field definitions, most of the Solr configuration is "stock," although I have temporarily commented out the solrconfig.xml's "Highlighting defaults" to make sure they aren't causing this problem:
<!-- Highlighting defaults
<str name="hl">on</str>
<str name="hl.fl">title snippet</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str> -->
My URL querystring is very simple. I've tried many variations, but here is my latest with it returning the most basic query:
hl=on&hl.fl=title&indent=on&version=2.2&q=toyota&fq=&start=0&rows=1&fl=*%2Cscore
Here is the resulting XML:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">32</int>
<lst name="params">
<str name="explainOther"/>
<str name="indent">on</str>
<str name="hl.fl">title</str>
<str name="wt"/>
<str name="hl">true</str>
<str name="version">2.2</str>
<str name="rows">1</str>
<str name="fl">*,score</str>
<str name="start">0</str>
<str name="q">toyota</str>
<str name="qt"/>
<str name="fq"/>
</lst>
</lst>
<result name="response" numFound="9549" start="0" maxScore="0.9960097">
<doc>
<float name="score">0.9960097</float>
<str name="id">2-33-200</str>
<str name="title">1992 Toyota Camry 2.2L CV Boots</str>
</doc>
</result>
<lst name="highlighting">
<lst name="2-33-200"/>
</lst>
</response>
How can I debug this issue further? Thanks!
Edit Here is the <highlighting>
section from solrconfig.xml. As I stated, it is stock. That could be the issue, but I'm new to Solr and not familiar with the highlighting ins and outs yet (obviously).
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
default="true"
class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!-- A regular-expression-based fragmenter
(for sentence extraction)
-->
<fragmenter name="regex"
class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html"
default="true"
class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<!-- Configure the standard encoder -->
<encoder name="html"
class="solr.highlight.HtmlEncoder" />
<!-- Configure the standard fragListBuilder -->
<fragListBuilder name="simple"
default="true"
class="solr.highlight.SimpleFragListBuilder"/>
<!-- Configure the single fragListBuilder -->
<fragListBuilder name="single"
class="solr.highlight.SingleFragListBuilder"/>
<!-- default tag FragmentsBuilder -->
<fragmentsBuilder name="default"
default="true"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<!--
<lst name="defaults">
<str name="hl.multiValuedSeparatorChar">/</str>
</lst>
-->
</fragmentsBuilder>
<!-- multi-colored tag FragmentsBuilder -->
<fragmentsBuilder name="colored"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default"
default="true"
class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!? 	 </str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator"
class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
<str name="hl.bs.type">WORD</str>
<!-- language and country are used when constructing Locale object. -->
<!-- And the Locale object will be used when getting instance of BreakIterator -->
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
Edit Although initially my "title" field was set to indexed="false" I have since tested setting it to true (no change / no highlighting still), and also termVectors="true" termPositions="true" termOffsets="true"... still no effect. (I tried these based on reading this post to SO.)
And here is my "title" field definition as of now:
<field name="title" type="string" indexed="true" stored="true" required="true" termVectors="true" termPositions="true" termOffsets="true" />
Initially I started with:
<field name="title" type="string" indexed="false" stored="true" required="true" />
Edit I've now also tried this definition:
<field name="title" type="text_general" indexed="true" stored="true" required="true" termVectors="true" termPositions="true" termOffsets="true" />
and no change in highlighting, still not working. My text_general definition is the default one that comes with Solr's demo:
<!-- A general text field that has reasonable, generic
cross-language defaults: it tokenizes with StandardTokenizer,
removes stop words from case-insensitive "stopwords.txt"
(empty by default), and down cases. At query time only, it
also applies synonyms. -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Edit I've now also tried re-indexing title with the text_en_splitting fieldtype, which uses WhitespaceTokenizerFactory instead of StandardTokenizerFactory, and still no highlighting. For what it's worth, I am using the standard query parser, which according to debugQuery=on is the LuceneQParser.
FINALLY! Thanks to @javanna for the help. I've done a lot of experimenting, and the two key takeaways are:
My definition now appears as:
<field name="Title" type="text_general" indexed="false" stored="true" required="true" />
And my solrconfig.xml has this set:
<str name="hl">on</str>
<str name="hl.fl">Title</str>
The way you're making highlighting seems good, but your solrconfig.xml looks a bit messy. Unfortunately the example you took uses basically all the available options, and I guess you don't need them. Unless you need something different from the default, I'd start commenting out all your highlighting configuration, as well as your default parameters. Then I'd play around with the url parameters you need, just a couple to start: hl=on and hl.fl=title. Once you've found the right parameters you can configure them as default.
That said, given your title fieldType I suspect it isn't tokenized, unless you changed the default string type definition. In that case your query wouldn't match the title field, that's why you don't get highlighting on it. Are you maybe using edismax (or dismax)? If yes, what is your qf parameter? Is it possible that the toyota term is on another field that matches your query? If you're using edismax you can try searching for q=title:toyota ans see if you get results.
You can also check where is your match enabling debugQuery=on and checking the debug output.
UPDATE
I saw you changed the title fieldType to text_general
, but this doesn't change anything because that type isn't tokenized on whitespaces. You haven't told yet what query parser you're using, anyway if I'm right you should use WhitespaceTokenizerFactory
instead of the StandardTokenizerFactory
:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
After that, remember to reindex all your data, otherwise you won't see any change.
Basically, if you index something like toyota whatever
without tokenizing on whitespaces, you won't get any result searching for toyota
, and you won't even have toyota
highlighted on that field because it doesn't match. My assumption is that you're using dismax
or edismax
query parser and searching on more than one field, and some of them but not title match your search, that's why you'd get results but not highlighting on title
, the only field you selected for highlighting. Can you post the results you get searching for toyota
? Is the toyota
term on some other fields than title
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With