Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql full text indexing limitations?

What are the limitations or gotchas or antipatterns or pitfalls?

It seems pretty attractive, apparently you can create a search engine with almost no work. But it cannot be without its problems...

what are your experiences?

like image 995
flybywire Avatar asked Nov 29 '22 21:11

flybywire


2 Answers

it cannot be without its problems...

It certainly isn't!

Any search term composed purely of blocked words will silently fail. Words can be blocked due to min/max length restrictions and/or the stopword file.

I found the default stopword file far too aggressive, it was preventing many valid searches. Also the default minimum length of 4 was kicking in very often for acronyms people might want to search for. I reduced the ft_min_word_len to 3 and removed the stoplist completely (ft_stopword_file=''). Doc: http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html

You could also examine the search query to see if it contains only <4-letter words, and fall back to a LIKE search in that case. There's no such easy way to get around the stoplist at an application level.

The selection of ‘word characters’ may not meet your needs, and it's tricky to change. For example searching for “Terry” won't match “Terry's”. In general there is no support for any kind of stemming, so “biscuit” won't match “biscuits” either.

Finally, as cg mentioned, there is no support for InnoDB. In this day and age, you don't want to be putting all your data in a MyISAM table.

If you have the storage to spare, what you can do is put the main, canonical version of the data in an InnoDB table, and then create a separate MyISAM table that contains a copy of the freetext content, purely for use as searchbait. You do have to update both tables on a change, but if the MyISAM table loses integrity then at least you only lose the ability to search over the rows concerned, instead of bumming up the real live data and getting application errors.

You can then, if you have the cycles to spare, implement your own text processing on the searchbait and query words to get around some of the above limitations. For example you can escape characters you want to be word-characters, remove characters you don't want to be word-characters, and perform simple manual English stemming.

like image 125
bobince Avatar answered Dec 15 '22 21:12

bobince


In my opinion, the greatest drawback is that the MySQL full text indexing is limited to MyISAM tables. As oppsosed to InnoDB tables, those lack a lot of important features, e.g. transactions.

like image 23
cg. Avatar answered Dec 15 '22 21:12

cg.