What are the limitations or gotchas or antipatterns or pitfalls?
It seems pretty attractive, apparently you can create a search engine with almost no work. But it cannot be without its problems...
what are your experiences?
it cannot be without its problems...
It certainly isn't!
Any search term composed purely of blocked words will silently fail. Words can be blocked due to min/max length restrictions and/or the stopword file.
I found the default stopword file far too aggressive, it was preventing many valid searches. Also the default minimum length of 4 was kicking in very often for acronyms people might want to search for. I reduced the ft_min_word_len to 3 and removed the stoplist completely (ft_stopword_file=''). Doc: http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
You could also examine the search query to see if it contains only <4-letter words, and fall back to a LIKE search in that case. There's no such easy way to get around the stoplist at an application level.
The selection of ‘word characters’ may not meet your needs, and it's tricky to change. For example searching for “Terry” won't match “Terry's”. In general there is no support for any kind of stemming, so “biscuit” won't match “biscuits” either.
Finally, as cg mentioned, there is no support for InnoDB. In this day and age, you don't want to be putting all your data in a MyISAM table.
If you have the storage to spare, what you can do is put the main, canonical version of the data in an InnoDB table, and then create a separate MyISAM table that contains a copy of the freetext content, purely for use as searchbait. You do have to update both tables on a change, but if the MyISAM table loses integrity then at least you only lose the ability to search over the rows concerned, instead of bumming up the real live data and getting application errors.
You can then, if you have the cycles to spare, implement your own text processing on the searchbait and query words to get around some of the above limitations. For example you can escape characters you want to be word-characters, remove characters you don't want to be word-characters, and perform simple manual English stemming.
In my opinion, the greatest drawback is that the MySQL full text indexing is limited to MyISAM tables. As oppsosed to InnoDB tables, those lack a lot of important features, e.g. transactions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With