Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boost Solr results based on the field that contained the hit

I was browsing the web looking for an indexing and search framework and stumbled upon Solr. A functionality that we absolutely need is to boost results based on what field contained the hit.

A small example:

Consider a record like this:

<movie>
  <title>The Dark Knight</title>
  <alternative_title>Batman Begins 2</alternative_title>
  <year>2008</year>
  <director>Christopher Nolan</director>
  <plot>Batman, Gordon and Harvey Dent are forced to deal with the chaos unleashed by an anarchist mastermind known only as the Joker, as it drives each of them to their limits.</plot>
</movie>

I want to combine for example the title, alternative_title and plot fields into one search field, which isn't too difficult after looking at the Solr/Lucene documentation and tutorials.
However I also want that movies that have a hit in title have a higher score than hits on alternative_title and those in their turn should score higher than hits in the plot field.
Is there any way to indicate this kind of scoring in the XML or do we need to develop some custom scoring algorithm?

Please also note that the example I've given is fictional and the real data will probably contain 100+ fields.

like image 428
TomFor Avatar asked Mar 11 '10 14:03

TomFor


People also ask

How to boost a query in Solr?

The default boost for a field is 1, so setting a value between 0 and 1 would down boost the document. It is also possible to add different boosts to different fields of a document. The only requirement here is that the boosted fields must store the norms (“omitNorms” attribute in the schema must be set to “false”).

How do I search a specific field in SOLR?

If you do not specify a field in a query, Solr searches only the default field. Alternatively, you can specify a different field or a combination of fields in a query. To specify a field, type the field name followed by a colon ":" and then the term you are searching for within the field.


1 Answers

This is what Solr's DismaxQueryParser was designed for. See http://wiki.apache.org/solr/DisMaxRequestHandler

There are a lot of parameters, but the main one you need to customize is "qf", which is how you specify what fields should be searched and the boost for each. So if you want title to dominate, you might specify something like:

title^10 alternative_title^2 director^1 plot^1

as the value of the qf parameter. You can set this up by customizing the example configuration and experiment from there.

like image 70
KenE Avatar answered Nov 27 '22 07:11

KenE