I have recently added search capabilities to my django-powered site to allow employers to search for employees using keywords. When the user initially uploads their resume, I turn it into text, get rid of stop words, and then add the text to a TextField for that user. I used Django-Haystack with the Whoosh search back engine.
Three things-
1) Aside from extra features which I'll probably not use, is there any concrete advantage to switching to Solr or Xapian?
2) In turning the resume into text, I essentially index the pdf myself. I know both Xapian and Solr support .pdf indexing, however, from the looks of it Haystack does not. Any tips on how to get around this? Or should I keep indexing it myself? If so, should I be doing more than simply providing a text file of keywords?
3) Whoosh only return a result if the keyword matches itself exactly. If a user has 'mathematics' as his keyword, and I search 'math', I want that user to appear. I couldn't definitively tell whether Xapian or Solr support this. Thoughts?
Thanks for any suggestion. I'm going to continue digging into this myself for the time being.
Unfortunately I don't know enough to answer your other questions, however for point 3.) Whoosh actually does support this.
You would have to use the autocomplete function of SearchQuerySet.
Detailed here: http://docs.haystacksearch.org/dev/autocomplete.html
I'm currently using Whoosh and matching on partial matches myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With