Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spelling correction in Sphinx?

I was about to integrate the Sphinx-based search into the website, but I've found that there's no built support for spelling correction.

Folks on the web suggest using pspell or other third-party libraries to get things done, but the problem is the data I'm going to search in, contains mostly "technical" terms like brand names, thus I don't think common libraries will include them.

On the other hand, Xapian states to have spelling correction support based on the data indexed, so exactly what I want. Is it worth using Xapian instead? I'm still quite confused of which fulltext search engine I should use: Sphinx seems to be quite good, but lacking some cool features of Xapian (or maybe Lucene?), while it looks like the latter has smaller community and less documentation.

I think I can solve the problem with words not present in pspell dictionary using the custom one for it, but I'm not sure whether that will impose noticeable performance losses? I'm going to use the search system for the spotlight search (separate search via ajax on every letter entered) on a pretty popular website, so performance matters.

Ideally, I'd like to make some fields like brand names have more priority over common dictionary but I guess that's not really important since most brand names a quite distinct from the other words.

Any suggestions on the general design of the custom full-text search engine are welcome too.

Thanks

like image 857
htf Avatar asked May 19 '10 09:05

htf


People also ask

How do you correct spelling in LaTeX?

If you want to spell-check your document, you can use the command-line aspell, hunspell (preferably), or ispell programs. All three understand LaTeX and will skip LaTeX commands. You can also use a LaTeX editor with built-in spell checking, such as LyX, Kile, or Emacs.

What is spelling corrector called?

plural spellcheckers or spell-checkers. : a computer program or function (as in a word processor) that identifies possible misspellings in a block of text by comparing the text with a database of accepted spellings. called also spell-check, spelling checker.


2 Answers

Sphinx has no built-in spelling-correction, but that can be implemented using Sphinx. Only one how-to article (by Sphinx author) about this can be found there http://habrahabr.ru/blogs/sphinx/61807 (in Russian, You can use GoogleTranslate for read this article. Look on the second part of article named "Я понял, это намек.")

I implement that method recently - works perfect!

like image 50
seriyPS Avatar answered Oct 11 '22 14:10

seriyPS


Sphinx allows you to use morphology preprocessors and word forms dictionaries. Both of these combined could get you closer to what you want to achieve. You can read more about both topics here: http://sphinxsearch.com/docs/manual-0.9.8.html#conf-morphology and further below.

There are several "flavours" of morphology preprocessors available, choose one that best fits your needs. The docs also mention the Snowball project, which can be used to add stems in other languages than the built-in english and russian, if needed. The project website: http://snowball.tartarus.org/

Sphinx is a very fast full text search engine and using stemmers is not likely to slow it down to the extent that you start noticing it.

like image 33
guntars Avatar answered Oct 11 '22 13:10

guntars