Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I don't understand the results that's returning from elasticsearch/haystack

The results that are being returned from haystack, using an elasticsearch backend seem erroneous to me. My search index is as follows:

from haystack import indexes
from .models import IosVideo

class VideoIndex(indexes.SearchIndex, indexes.Indexable):                   
    text = indexes.CharField(document=True, use_template=True)              
    title = indexes.CharField(model_attr='title')                           
    absolute_url = indexes.CharField(model_attr='get_absolute_url')         
#    content_auto = indexes.EdgeNgramField(model_attr='title')              
    description = indexes.CharField(model_attr='description')               
#    thumbnail = indexes.CharField(model_attr='thumbnail_url', null=True)   

    def get_model(self):                                                    
        return IosVideo                                                     

    def index_queryset(self, using=None):                                   
        return self.get_model().objects.filter(private=False)  

My text document looks like:

{{ object.title }}
{{ object.text }}
{{ object.description }}

My query is

SearchQuerySet().models(IosVideo).filter(content="darby")[0]

The result that's returning that makes me think this is not working is a video object with the following characteristics

title: u'Cindy Daniels'
description: u'',
text: u'Cindy Daniels\n\n\n',
absolute_url: u'/videos/testimonial/cindy-daniels/'

Why in the world would the query return such a result? I'm very confused.

My current theory is that it's tokenizing every subset of the char in the query and using that as partial match. Is there a way to decrease this tolerance to be a closer match.

My pip info is elasticsearch==1.2.0 django-haystack==2.3.1

And the elasticsearch version number is 1.3.1

Additionally when I hit the local server with http://localhost:9200/haystack/_search/?q=darby&pretty

It returns 10 results.

SearchQuerySet().filter(content="darby")  

Returns 4k results.

Does any one know what would cause this type of behavior?

like image 379
user133688 Avatar asked Mar 05 '15 21:03

user133688


1 Answers

There is a problem with the filter() method on Charfield indexes for django-haystack 2.1.0. You can change them to NgramField instead, for example text = indexes.NgramField(document=True, template_name=True).

The problem is that when you use this combination you get just the first character. So it returns you all the matches that has a 'd' in their text index field.

like image 163
Ricardo Burillo Avatar answered Oct 10 '22 00:10

Ricardo Burillo