Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient word boundary query in mySQL

Tags:

regex

mysql

I have a table with 1/2 million phrases and I am doing word matching using this query:

SELECT * FROM `searchIndex` WHERE `indexData` RLIKE '[[:<:]]Hirt'

The indexData field has a FULLTEXT index and is datatype longtext.

I want to match on items like

"Alois M. Hirt"

"Show Biz - Al Hirt, in a new role, ..."

"Al Hirt's Sinatraville open 9 p..."

"Hirt will be playing..."

and not on "shirt" or "thirteen" or "thirty" etc.

The query is succeeding but it frequently takes 3 seconds to return and I wondered if there was a better, more efficient way of doing this word boundary match?

If I were to add another index to indexData what would be the correct keylength to use?

TIA

like image 509
jerrygarciuh Avatar asked Aug 31 '11 18:08

jerrygarciuh


1 Answers

No need to have a FULLTEXT index. MySQL has special markers for word boundaries. From the MySQL doc:

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';   -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';  -> 0
like image 110
Renaud Avatar answered Sep 19 '22 23:09

Renaud