Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Haystack substring search

I have recently added search capabilities to my django-powered site to allow employers to search for employees using keywords. When the user initially uploads their resume, I turn it into text, get rid of stop words, and then add the text to a TextField for that user. I used Django-Haystack with the Whoosh search back engine.

Three things-

1) Aside from extra features which I'll probably not use, is there any concrete advantage to switching to Solr or Xapian?

2) In turning the resume into text, I essentially index the pdf myself. I know both Xapian and Solr support .pdf indexing, however, from the looks of it Haystack does not. Any tips on how to get around this? Or should I keep indexing it myself? If so, should I be doing more than simply providing a text file of keywords?

3) Whoosh only return a result if the keyword matches itself exactly. If a user has 'mathematics' as his keyword, and I search 'math', I want that user to appear. I couldn't definitively tell whether Xapian or Solr support this. Thoughts?

Thanks for any suggestion. I'm going to continue digging into this myself for the time being.

like image 388
dpetters Avatar asked Aug 08 '10 00:08

dpetters


1 Answers

Unfortunately I don't know enough to answer your other questions, however for point 3.) Whoosh actually does support this.

You would have to use the autocomplete function of SearchQuerySet.

Detailed here: http://docs.haystacksearch.org/dev/autocomplete.html

I'm currently using Whoosh and matching on partial matches myself.

like image 164
Chris Lefevre Avatar answered Oct 02 '22 20:10

Chris Lefevre