We have a simple search on our site that uses MySQL fulltext search and for some reason it doesn't seem to be returning the correct results. I don't know if it's some kind of issue with Amazon RDS (where our database server resides) or with the query we are requesting.
Here is the structure of the database table:
CREATE TABLE `items` (
`object_id` int(9) unsigned NOT NULL DEFAULT '0',
`slug` varchar(100) DEFAULT NULL,
`name` varchar(100) DEFAULT NULL,
PRIMARY KEY (`object_id`),
FULLTEXT KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
And here is a simple fulltext search query on this table and the returned results:
select object_id ,slug,name from items where MATCH (name) AGAINST ('+ski*' IN BOOLEAN MODE) order by name;
+-----------+-----------------------------------+------------------+
| object_id | slug | name |
+-----------+-----------------------------------+------------------+
| 10146041 | us/new-hampshire/dartmouth-skiway | Dartmouth Skiway |
+-----------+-----------------------------------+------------------+
If I instead use LIKE I get a different set of results:
select object_id,slug,name from items where name LIKE "%ski%" order by name;
+-----------+------------------------------------------+----------------------------------+
| object_id | slug | name |
+-----------+------------------------------------------+----------------------------------+
| 10146546 | us/new-york/brantling-ski | Brantling Ski |
| 10146548 | us/new-york/buffalo-ski-club | Buffalo Ski Club |
| 10146041 | us/new-hampshire/dartmouth-skiway | Dartmouth Skiway |
| 10146352 | us/montana/discover-ski | Discover Ski |
| 10144882 | us/california/donner-ski-ranch | Donner Ski Ranch |
| 10146970 | us/new-york/hickory-ski-center | Hickory Ski Center |
| 10146973 | us/new-york/holimont-ski-area | Holimont Ski Area |
| 10146283 | us/minnesota/hyland-ski | Hyland Ski |
| 10145911 | us/nevada/las-vegas-ski-snowboard-resort | Las Vegas Ski & Snowboard Resort |
| 10146977 | us/new-york/maple-ski-ridge | Maple Ski Ridge |
| 10146774 | us/oregon/mount-hood-ski-bowl | Mt. Hood Ski Bowl |
| 10145949 | us/new-mexico/sipapu-ski | Sipapu Ski |
| 10145952 | us/new-mexico/ski-apache | Ski Apache |
| 10146584 | us/north-carolina/ski-beech | Ski Beech |
| 10147973 | canada/quebec/ski-bromont | Ski Bromont |
| 10146106 | us/michigan/ski-brule | Ski Brule |
| 10145597 | us/massachusetts/ski-butternut | Ski Butternut |
| 10145117 | us/colorado/ski-cooper | Ski Cooper |
| 10146917 | us/pennsylvania/ski-denton | Ski Denton |
| 10145954 | us/new-mexico/ski-santa-fe | Ski Santa Fe |
| 10146918 | us/pennsylvania/ski-sawmill | Ski Sawmill |
| 10145299 | us/illinois/ski-snowstar | Ski Snowstar |
| 10145138 | us/connecticut/ski-sundown | Ski Sundown |
| 10145598 | us/massachusetts/ski-ward | Ski Ward |
+-----------+------------------------------------------+----------------------------------+
I'm at a complete loss as to why the query using fulltext search is not working. I'm hoping that some MySQL expert out there can point out the error in our query.
Thanks in advance for your help!
+
A leading plus sign indicates
that this word must be present in
each row that is returned.
*
The asterisk serves as the
truncation (or wildcard) operator.
Unlike the other operators, it should
be appended to the word to be
affected. Words match if they begin
with the word preceding the *
operator.
If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix.
MATCH(...) AGAINST(...)
MATCH (name) AGAINST ('+ski*' IN BOOLEAN MODE)
means that you're searching for rows where a word in the name
column must contain ski
, and must begin with the word ski
.
From the set you've posted, Dartmouth Skiway
is the only name
that conforms to these requirements: it contains the word ski
, and is prefixed by the word ski
.
The other name
columns, though they match the first rule: must contain ski
, they are not prefixed with ski
, as stipulated in your rule. The row returned by your boolean search is the only one with a name
column that contains a word that both contains ski
and is a word prefixed by ski
.
As suggested by ajreal, try decreasing the ft_min_len_word_setting
in my.cnf
. Your search might be failing to come up with the results you expect because of the default setting. Try reducing it to 3.
WHERE column LIKE %text%
WHERE name LIKE "%ski%"
searches for rows with name
columns that contain ski
, no matter where the word occurs.
The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. (See Section 5.1.4, “Server System Variables”.) The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file:
resource - http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
configuration:
[mysqld] ft_min_word_len=3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With