Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Full Text Search Mystery

We have a simple search on our site that uses MySQL fulltext search and for some reason it doesn't seem to be returning the correct results. I don't know if it's some kind of issue with Amazon RDS (where our database server resides) or with the query we are requesting.

Here is the structure of the database table:

CREATE TABLE `items` (
  `object_id` int(9) unsigned NOT NULL DEFAULT '0',
  `slug` varchar(100) DEFAULT NULL,
  `name` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`object_id`),
  FULLTEXT KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

And here is a simple fulltext search query on this table and the returned results:

select object_id ,slug,name from items where MATCH (name) AGAINST ('+ski*' IN BOOLEAN MODE) order by name;

+-----------+-----------------------------------+------------------+
| object_id | slug                              | name             |
+-----------+-----------------------------------+------------------+
|  10146041 | us/new-hampshire/dartmouth-skiway | Dartmouth Skiway |
+-----------+-----------------------------------+------------------+

If I instead use LIKE I get a different set of results:

select object_id,slug,name from items where name LIKE "%ski%" order by name;

+-----------+------------------------------------------+----------------------------------+
| object_id | slug                                     | name                             |
+-----------+------------------------------------------+----------------------------------+
|  10146546 | us/new-york/brantling-ski                | Brantling Ski                    |
|  10146548 | us/new-york/buffalo-ski-club             | Buffalo Ski Club                 |
|  10146041 | us/new-hampshire/dartmouth-skiway        | Dartmouth Skiway                 |
|  10146352 | us/montana/discover-ski                  | Discover Ski                     |
|  10144882 | us/california/donner-ski-ranch           | Donner Ski Ranch                 |
|  10146970 | us/new-york/hickory-ski-center           | Hickory Ski Center               |
|  10146973 | us/new-york/holimont-ski-area            | Holimont Ski Area                |
|  10146283 | us/minnesota/hyland-ski                  | Hyland Ski                       |
|  10145911 | us/nevada/las-vegas-ski-snowboard-resort | Las Vegas Ski & Snowboard Resort |
|  10146977 | us/new-york/maple-ski-ridge              | Maple Ski Ridge                  |
|  10146774 | us/oregon/mount-hood-ski-bowl            | Mt. Hood Ski Bowl                |
|  10145949 | us/new-mexico/sipapu-ski                 | Sipapu Ski                       |
|  10145952 | us/new-mexico/ski-apache                 | Ski Apache                       |
|  10146584 | us/north-carolina/ski-beech              | Ski Beech                        |
|  10147973 | canada/quebec/ski-bromont                | Ski Bromont                      |
|  10146106 | us/michigan/ski-brule                    | Ski Brule                        |
|  10145597 | us/massachusetts/ski-butternut           | Ski Butternut                    |
|  10145117 | us/colorado/ski-cooper                   | Ski Cooper                       |
|  10146917 | us/pennsylvania/ski-denton               | Ski Denton                       |
|  10145954 | us/new-mexico/ski-santa-fe               | Ski Santa Fe                     |
|  10146918 | us/pennsylvania/ski-sawmill              | Ski Sawmill                      |
|  10145299 | us/illinois/ski-snowstar                 | Ski Snowstar                     |
|  10145138 | us/connecticut/ski-sundown               | Ski Sundown                      |
|  10145598 | us/massachusetts/ski-ward                | Ski Ward                         |
+-----------+------------------------------------------+----------------------------------+

I'm at a complete loss as to why the query using fulltext search is not working. I'm hoping that some MySQL expert out there can point out the error in our query.

Thanks in advance for your help!

like image 591
Russell C. Avatar asked Feb 26 '23 06:02

Russell C.


2 Answers

From MySQL docs

  • + A leading plus sign indicates that this word must be present in each row that is returned.

  • * The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.

    If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix.

In Context:

MATCH(...) AGAINST(...)

MATCH (name) AGAINST ('+ski*' IN BOOLEAN MODE) means that you're searching for rows where a word in the name column must contain ski, and must begin with the word ski.

From the set you've posted, Dartmouth Skiway is the only name that conforms to these requirements: it contains the word ski, and is prefixed by the word ski.

The other name columns, though they match the first rule: must contain ski, they are not prefixed with ski, as stipulated in your rule. The row returned by your boolean search is the only one with a name column that contains a word that both contains ski and is a word prefixed by ski.

As suggested by ajreal, try decreasing the ft_min_len_word_setting in my.cnf. Your search might be failing to come up with the results you expect because of the default setting. Try reducing it to 3.

WHERE column LIKE %text%

WHERE name LIKE "%ski%" searches for rows with name columns that contain ski, no matter where the word occurs.

like image 159
Michael Robinson Avatar answered Feb 27 '23 18:02

Michael Robinson


The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. (See Section 5.1.4, “Server System Variables”.) The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file:

resource - http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html

configuration:

[mysqld]
ft_min_word_len=3
like image 31
ajreal Avatar answered Feb 27 '23 18:02

ajreal