Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you get your Fulltext boolean search to pick up the term C++?

So, I need to find out how to do a fulltext boolean search on a MySQL database to return a record containg the term "C++".

I have my SQL search string as:

SELECT * 
FROM mytable 
WHERE MATCH (field1, field2, field3) 
AGAINST ("C++" IN BOOLEAN MODE) 

Although all of my fields contain the string C++, it is never returned in the search results.

How can I modify MySQL to accommodate this? Is it possible?

The only solution I have found would be to escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate, but this seems cumbersome and there has to be a better way.

like image 933
Bamerza Avatar asked Feb 25 '09 06:02

Bamerza


People also ask

Does MySQL support Boolean data type?

MySQL does not have a boolean (or bool) data type. Instead, it converts boolean values into integer data types (TINYINT). When you create a table with a boolean data type, MySQL outputs data as 0, if false, and 1, if true.

What is full-text search in MySQL?

A full-text index in MySQL is an index of type FULLTEXT . Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR , VARCHAR , or TEXT columns.

What is match in MySQL?

In MySQL, the MATCH() function performs a full-text search. It accepts a comma separated list of table columns to be searched.


1 Answers

How can I modify MySQL to accommodate this?

You'll have to change MySQL's idea of what a word is.

Firstly, the default minimum word length is 4. This means that no search term containing only words of <4 letters will ever match, whether that's ‘C++’ or ‘cpp’. You can configure this using the ft_min_word_len config option, eg. in your my.cfg:

[mysqld]
ft_min_word_len=3

(Then stop/start MySQLd and rebuild fulltext indices.)

Secondly, ‘+’ is not considered a letter by MySQL. You can make it a letter, but then that means you won't be able to search for the word ‘fish’ in the string ‘fish+chips’, so some care is required. And it's not trivial: it requires recompiling MySQL or hacking an existing character set. See the section beginning “If you want to change the set of characters that are considered word characters...” in section 11.8.6 of the doc.

escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate

Yes, something like that is a common solution: you can keep your ‘real’ data (without the escaping) in a primary, definitive table — usually using InnoDB for ACID compliance. Then an auxiliary MyISAM table can be added, containing only the mangled words for fulltext search bait. You can also do a limited form of stemming using this approach.

Another possibility is to detect searches that MySQL can't do, such as those with only short words, or unusual characters, and fall back to a simple-but-slow LIKE or REGEXP search for those searches only. In this case you will probably also want to remove the stoplist by setting ft_stopword_file to an empty string, since it's not practical to pick up everything in that as special too.

like image 104
bobince Avatar answered Oct 09 '22 09:10

bobince