How does various boosting types work together in django, django-haystack and solr?
I am having trouble getting the most obvious search results to appear first. If I search for caring for others
and get 10 results, The object with title caring for others
appears second in the results after caring for yourself
.
I have document boosted Category
objects a factor of factor = 2.0 - ((the mptt tree level)/10)
so 1.9 for root nodes, 1.8 for second level, 1.7 for third level so on and so forth. (or 190%, 180%, 170%... so on and so forth)
title is boosted by boost=1.5
positive factor of 150%
content is boosted by boost=.5
negative factor 50%
I am currently not boosting any search terms.
I want to get a list of results Categories and Articles (I'm ignoring Articles until I get my Category results straight). With Categories weighted higher than Articles, and titles weighted higher than content. Also, I'm trying to weight root category nodes higher than child nodes.
I feel like I'm missing a key concept somewhere.
I'm using haystack's built-in search form and search view.
I'm using the following package/lib versions:
Django==1.4.1
django-haystack==1.2.7
pysolr==2.1.0-beta
My Index Class
class CategoryIndex(SearchIndex):
"""Categorization -> Category"""
text = CharField(document=True, use_template=True, boost=.5)
title = CharField(model_attr='title', boost=1.5)
content = CharField(model_attr='content', boost=.5)
autocomplete = EdgeNgramField(model_attr='title')
def prepare_title(self, object):
return object.title
def prepare(self, obj):
data = super(CategoryIndex, self).prepare(obj)
base_boost = 2.0
base_boost -= (float(int(obj.level))/10)
data['boost'] = base_boost
return data
my search template at templates/search/categorization/category_text.txt
{{ object.title }}
{{ object.content }}
I noticed that when I took {{ object.content }}
out of my search template, that records started appearing in the expected order. Why is this?
The Dismax Parser (additionally ExtendedDismax from SOLR 3.1 on) has been created exactly for these needs. You can configure all the fields that you want to have searched ('qf' parameter), add custom boosting to each and specify those fields where phrase hits are especially valuable (adding to the hit's score; the 'pf' parameter). You can also specify how many tokens in a search have to match (by a flexible rule pattern; the 'mm' parameter).
e.g. the config could look like this (part of a request handler config entry in solrconfig.xml - I'm not familiar how to do that with haystack, this is plain SOLR):
<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="qf">text^0.5 title^1.5 content^0.5</str>
<str name="pf">text title^2 content</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
<int name="ps">100</int>
I don't know about haystack but it seems it would provide Dismax functionality: https://github.com/toastdriven/django-haystack/pull/314
See this documentation for the Dismax (it links to ExtendedDismax, as well): http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With