Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the rules to order the keywords in a MySQL boolean search?

When I change the order of the keywords in a boolean search, I get the same result but very different performance results.

The profiling on MySQL 5.6.33 with a MyISAM table, ft_min_word_len=2 and description_index as a FULLTEXT index on title and description returns this:

# Query 1
SELECT id
FROM archive, topic
WHERE topic.type=0 AND archive.status=2
AND MATCH(title, description) AGAINST ('+house* +tz*' IN BOOLEAN MODE)
AND archive.topicId = topic.id
ORDER BY archive.featured DESC, archive.submissionDate DESC LIMIT 0,20

Result:

Total count: 12
Key_read_requests: 2384607
Creating sort index: 7.950430 sec (!)
Duration: 8.851252 sec

# Query 2
SELECT id
FROM archive, topic
WHERE topic.type=0 AND archive.status=2
AND MATCH(title, description) AGAINST ('+tz* +house*' IN BOOLEAN MODE)
AND archive.topicId = topic.id
ORDER BY archive.featured DESC, archive.submissionDate DESC LIMIT 0,20

Result:

Total count: 12
Key_read_requests: 415
Creating sort index: 0.003449
Duration: 0.004054 sec

Total records per keyword:

tz*: 135092
tz: 25596
house*: 12

Explain is the same for both queries:

id | select_type | Table   | Type     | Key               | Key len | Ref             | Rows | Extra
1  | SIMPLE      | archive | fulltext | description_index | 0       |                 | 1    | Using where; Using filesort
1  | SIMPLE      | topic   | eq_ref   | PRIMARY           | 3       | archive.topicId | 1    | Using where

Only Key_read_requests and Creating sort index are different between the 2 queries.

It seems that:

  1. the order of the keyword order is a critical performance factor
  2. the keywords are used in reverse order
  3. having the most discriminating keyword at the end improves the performance.

Questions:

  • What is the reason of this big performance difference?
  • What are the rules/best practices? (I could not find anything in the documentation of mysql).
like image 364
Toto Avatar asked Jan 21 '17 20:01

Toto


1 Answers

Edit after OP comment:
I'm not sure about the exact query plan when resolving this query.
Sometimes one operation is more expensive than another therefore doing the less expensive operation first can sort out many rows that then don't have to go through the more expensive operation which leads to a reduced running time.
(In your example one of the matching-operations could be more expensive than the other which increases and reduces running time by changing the order of the strings to match against).

like image 178
iLikeMySql Avatar answered Nov 03 '22 23:11

iLikeMySql