Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django, Haystack, Solr and Boosting

TLDR;

How does various boosting types work together in django, django-haystack and solr?

I am having trouble getting the most obvious search results to appear first. If I search for caring for others and get 10 results, The object with title caring for others appears second in the results after caring for yourself.

Document Boosting

I have document boosted Category objects a factor of factor = 2.0 - ((the mptt tree level)/10) so 1.9 for root nodes, 1.8 for second level, 1.7 for third level so on and so forth. (or 190%, 180%, 170%... so on and so forth)

Field Boosting

title is boosted by boost=1.5 positive factor of 150% content is boosted by boost=.5 negative factor 50%

Term Boosting

I am currently not boosting any search terms.

My Goal

I want to get a list of results Categories and Articles (I'm ignoring Articles until I get my Category results straight). With Categories weighted higher than Articles, and titles weighted higher than content. Also, I'm trying to weight root category nodes higher than child nodes.

I feel like I'm missing a key concept somewhere.

Information

I'm using haystack's built-in search form and search view.

I'm using the following package/lib versions:

Django==1.4.1
django-haystack==1.2.7
pysolr==2.1.0-beta

My Index Class

class CategoryIndex(SearchIndex):
    """Categorization -> Category"""
    text = CharField(document=True, use_template=True, boost=.5)
    title = CharField(model_attr='title', boost=1.5)
    content = CharField(model_attr='content', boost=.5)
    autocomplete = EdgeNgramField(model_attr='title')

    def prepare_title(self, object): 
        return object.title

    def prepare(self, obj):
        data = super(CategoryIndex, self).prepare(obj)
        base_boost = 2.0
        base_boost -= (float(int(obj.level))/10)
        data['boost'] = base_boost
        return data

my search template at templates/search/categorization/category_text.txt

{{ object.title }}
{{ object.content }}

UPDATE

I noticed that when I took {{ object.content }} out of my search template, that records started appearing in the expected order. Why is this?

like image 924
Francis Yaconiello Avatar asked Sep 04 '12 20:09

Francis Yaconiello


1 Answers

The Dismax Parser (additionally ExtendedDismax from SOLR 3.1 on) has been created exactly for these needs. You can configure all the fields that you want to have searched ('qf' parameter), add custom boosting to each and specify those fields where phrase hits are especially valuable (adding to the hit's score; the 'pf' parameter). You can also specify how many tokens in a search have to match (by a flexible rule pattern; the 'mm' parameter).

e.g. the config could look like this (part of a request handler config entry in solrconfig.xml - I'm not familiar how to do that with haystack, this is plain SOLR):

<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="qf">text^0.5 title^1.5 content^0.5</str>
<str name="pf">text title^2 content</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
<int name="ps">100</int>

I don't know about haystack but it seems it would provide Dismax functionality: https://github.com/toastdriven/django-haystack/pull/314

See this documentation for the Dismax (it links to ExtendedDismax, as well): http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax

like image 192
Risadinha Avatar answered Sep 17 '22 12:09

Risadinha