Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve performance on MySQL fulltext search query

I have a following MySQL query:

SELECT p.*, MATCH (p.description) AGAINST ('random text that you can use in sample web pages or typography samples') AS score 
FROM posts p 
WHERE p.post_id <> 23 
AND MATCH (p.description) AGAINST ('random text that you can use in sample web pages or typography samples') > 0 
ORDER BY score DESC LIMIT 1

With 108,000 rows, it takes ~200ms. With 265,000 rows, it takes ~500ms.

Under performance testing(~80 concurrent users) it shows ~18sec average latency.

Is any way to improve performance for this query ?

EXPLAIN OUTPUT:

enter image description here

UPDATED

We have added one new mirror MyISAM table with post_id, description and synchronized it with posts table via triggers. Now, fulltext search on this new MyISAM table works ~400ms(with the same performance load where InnoDB shows ~18sec.. this is a huge performance boost) Look like MyISAM is much more quicker for fulltext in MySQL than InnoDB. Could you please explain it ?

MySQL profiler results:

Tested on AWS RDS db.t2.small instance

Original InnoDB posts table:

enter image description here

MyISAM mirror table with post_id, description only:

enter image description here

like image 658
alexanoid Avatar asked Jul 02 '15 10:07

alexanoid


People also ask

What is advantage of fulltext over like for performing text search in MySQL?

Like uses wildcards only, and isn't all that powerful. Full text allows much more complex searching, including And, Or, Not, even similar sounding results (SOUNDEX) and many more items.

What is fulltext search in MySQL?

Full-text search is a searching technique used to examine all words in the document that might not perfectly match the search criteria. The records contain textual data like product descriptions, blog posts, articles, etc.


2 Answers

Here are a few tips what to look for in order to maximise the speed of such queries with InnoDB:

  1. Avoid redundant sorting. Since InnoDB already sorted the result according to ranking. MySQL Query Processing layer does not need to sort to get top matching results.

  2. Avoid row by row fetching to get the matching count. InnoDB provides all the matching records. All those not in the result list should all have ranking of 0, and no need to be retrieved. And InnoDB has a count of total matching records on hand. No need to recount.

  3. Covered index scan. InnoDB results always contains the matching records' Document ID and their ranking. So if only the Document ID and ranking is needed, there is no need to go to user table to fetch the record itself.

  4. Narrow the search result early, reduce the user table access. If the user wants to get top N matching records, we do not need to fetch all matching records from user table. We should be able to first select TOP N matching DOC IDs, and then only fetch corresponding records with these Doc IDs.

I don't think you cannot get that much faster looking only at the query itself, maybe try removing the ORDER BY part to avoid unnecessary sorting. To dig deeper into this, maybe profile the query using MySQLs inbuild profiler.

Other than that, you might look into the configuration of your MySQL server. Have a look at this chapter of the MySQL manual, it contains some good informations on how to tune the fulltext index to your needs.

If you've already maximized the capabilities of your MySQL server configuration, then consider looking at the hardware itself - sometimes even a lost cost solution like moving the tables to another, faster hard drive can work wonders.

like image 109
Bjoern Avatar answered Oct 20 '22 21:10

Bjoern


My best guess for the performance hit is the number of rows being returned by the query. To test this, simply remove the order by score and see if that improves the performance.

If it does not, then the issue is the full text index. If it does, then the issue is the order by. If so, the problem becomes a bit more difficult. Some ideas:

  • Determine a hardware solution to speed up the sorts (getting the intermediate files to be in memory).
  • Modifying the query so it returns fewer values. This might involve changing the stop-word list, changing the query to boolean mode, or other ideas.
  • Finding another way of pre-filtering the results.
like image 42
Gordon Linoff Avatar answered Oct 20 '22 22:10

Gordon Linoff