Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much does decoupling tables in MySQL improve performance?

I am designing the database to store some blog posts in MySQL. I recently ran across this answer which advises that when you have:

  1. A table that will be queried regularly (e.g. a listing of blog posts), but
  2. One column in that table holds a large amount of data that will not be regularly accessed (the blog content)

Then it'd be better for performance if you store that the content in a separate table, so when you generate the listing, it'd be quicker.

CREATE TABLE article (
    id INT(10) UNSIGNED,
    title VARCHAR(40),
    author_id INT(10) UNIGNED,
    created DATETIME,
    modified DATETIME
);

CREATE TABLE article_text (
    id INT(10) UNSIGNED,
    body TEXT
);

Does this affect performance even if the column is not part of the query:

SELECT id, title FROM article WHERE author_id=33 ORDER BY created DESC LIMIT 5

And at what scale does it become a performance issue? (A few hundred, thousands? Millions?)

like image 881
dayuloli Avatar asked Mar 16 '23 08:03

dayuloli


1 Answers

With MySQL 5.5 and later, the InnoDB storage engine supports the Barracuda file format. In order to use the Barracuda file format for InnoDB, you must use a file-per-table tablespace, or use a generic table space (the single-file "system" InnoDB tablespace does NOT support Barracuda).

Prior to Barracuda (Antelope), MySQL always stored at least the first 768 bytes of a TEXT column in the clustered (primary key) index. In that case, having a TEXT column, even when not referencing it, increased the size of each row in the clustered index (leaf nodes). This slowed table scanning on the other non-TEXT columns since less pages fit on each page (more pages to scan on average to find what you're looking for), yet increased performance when scanning the TEXT column (the first 768 bytes anyway). Are you doing a lot of table scanning? Hopefully you're able to use indexes to avoid table scanning.

Indexes are b-trees and key searches are done at the internal nodes, which contain only the key. For the clustered index, that's only the primary key, so clustered index searches are unaffected by the amount of data in the leaf node (but are affected by primary key size).

With the Barracuda file system for InnoDB, the entire TEXT column is stored in overflow pages (which can be compressed). No part of it is stored in the clustered index (leaf nodes). So, if you're using the Barracuda file system, let MySQL do the separation for you and just put the TEXT column in the same table. You're not gaining anything in that case, since MYSQL is already putting the TEXT column somewhere else and isn't affecting scanning the non-TEXT columns.

If you are using Antelope, you might consider splitting it if you are scanning the non-TEXT columns frequently (which you should avoid anyway), and you rarely referenced the TEXT column. Splitting it into two tables, to read the whole record, you now have to do searches on two clustered indexes, which is twice as expensive as one.

I could also see wanting to split it with Antelope on a system where MySQL had limited RAM for the InnoDB buffer space, and you rarely referenced the TEXT column. The leaf nodes that contained the non-TEXT columns would be more likely to stay in memory if they were smaller.

like image 85
Marcus Adams Avatar answered Mar 19 '23 04:03

Marcus Adams