Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Haystack - Filter by substring of a field using SearchQuerySet ()

I have a Django project that uses SOLR for indexing.

I'm trying to do a substring search using Haystack's SearchQuerySet class.

For example, when a user searches for the term "ear", it should return the entry that has a field with the value: "Search". As you can see, "ear" is a SUBSTRING of "Search". (obviously :))

In other words, in a perfect Django world I would like something like:

SearchQuerySet().all().filter(some_field__contains_substring='ear')

In the haystack documentation for SearchQuerySet (https://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#field-lookups), it says that only the following FIELD LOOKUP types are supported:

  • contains
  • exact
  • gt, gte, lt, lte
  • in
  • startswith
  • range

I tried using __contains, but it behaves exactly like __exact, which looks up the exact word (the whole word) in a sentence, not a substring of a word.

I am confused, because such a functionality is pretty basic, and I'm not sure if I'm missing something, or there is another way to approach this problem (using Regex or something?).

Thanks

like image 564
Nahn Avatar asked Dec 18 '13 13:12

Nahn


2 Answers

That could be done using EdgeNgramField field:

some_field = indexes.EdgeNgramField() # also prepare value for this field or use model_attr

Then for partial match:

SearchQuerySet().all().filter(some_field='ear')
like image 60
Aamir Rind Avatar answered Nov 12 '22 13:11

Aamir Rind


It's a bug in haystack.

As you said, __exact is implemented exactly like __contains and therefore this functionality does not exists out of the box in haystack.

The fix is awaiting merge here: https://github.com/django-haystack/django-haystack/issues/1041

You can bridge the waiting time for a fixed release like this:

from haystack.inputs import BaseInput, Clean


class CustomContain(BaseInput):
    """
    An input type for making wildcard matches.
    """
    input_type_name = 'custom_contain'

    def prepare(self, query_obj):
        query_string = super(CustomContain, self).prepare(query_obj)
        query_string = query_obj.clean(query_string)

        exact_bits = [Clean(bit).prepare(query_obj) for bit in query_string.split(' ') if bit]
        query_string = u' '.join(exact_bits)

        return u'*{}*'.format(query_string)

# Usage:
SearchQuerySet().filter(content=CustomContain('searchcontentgoeshere'))
like image 40
shredding Avatar answered Nov 12 '22 12:11

shredding