Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySql partitoning vs indexing performance

Tags:

mysql

innodb

In MySql InnoDB, is there an performance advantage of partitioning the table compared to simply using an index?

like image 836
Pinch Avatar asked Sep 21 '25 05:09

Pinch


2 Answers

Common considerations:

Is an Index the Best Solution?

An index isn’t always the right tool. At a high level, keep in mind that indexes are most effective when they help the storage engine find rows without adding more work than they avoid. For very small tables, it is often more effective to simply read all the rows in the table. For medium to large tables, indexes can be very effective. For enormous tables, the overhead of indexing, as well as the work required to actually use the indexes, can start to add up. In such cases you might need to choose a technique that identifies groups of rows that are interesting to the query, instead of individual rows. You can use partitioning for this purpose. If you have lots of tables, it can also make sense to create a metadata table to store some characteristics of interest for your queries. For example, if you execute queries that perform aggregations over rows in a multitenant application whose data is partitioned into many tables, you can record which users of the system are actually stored in each table, thus letting you simply ignore tables that don’t have information about those users. These tactics are usually useful only at extremely large scales. In fact, this is a crude approximation of what Infobright does. At the scale of terabytes, locating individual rows doesn’t make sense; indexes are replaced by per-block metadata.
One thing is sure: you can’t scan the whole table every time you want to query it, because it’s too big. And you don’t want to use an index because of the maintenance cost and space consumption. Depending on the index, you could get a lot of fragmentation and poorly clustered data, which would cause death by a thousand cuts through random I/O. You can sometimes work around this for one or two indexes, but rarely for more. Only two workable options remain: your query must be a sequential scan over a portion of the table, or the desired portion of the table and index must fit entirely in memory. It’s worth restating this: at very large sizes, B-Tree indexes don’t work. Unless the index covers the query completely, the server needs to look up the full rows in the table, and that causes random I/O a row at a time over a very large space, which will just kill query response times. The cost of maintaining the index (disk space, I/O operations) is also very high. Systems such as Infobright acknowledge this and throw B-Tree indexes out entirely, opting for something coarser-grained but less costly at scale, such as per-block metadata over large blocks of data. This is what partitioning can accomplish, too. The key is to think about partitioning as a crude form of indexing that has very low overhead and gets you in the neighborhood of the data you want. From there, you can either scan the neighborhood sequentially, or fit the neighborhood in memory and index it. Partitioning has low overhead because there is no data structure that points to rows and must be updated—partitioning doesn’t identify data at the precision of rows, and has no data structure to speak of. Instead, it has an equation that says which partitions can contain which categories of rows.

(many thanks to High Performance MySQL great book)

like image 162
RomanPerekhrest Avatar answered Sep 22 '25 21:09

RomanPerekhrest


99% of cases I have looked at do not benefit from PARTITIONing as much as from INDEXing.

My Rules of Thumb for using Partitioning are in http://mysql.rjweb.org/doc.php/partitionmaint . Also, that lists the only 4 use cases where partitioning improves performance.

OK, I can't say "exactly" 99%, but it is very close to that. I do believe strongly in the "4" -- I have been searching since partitioning was added to MySQL many years ago.

For Data Warehousing, the usual performance solution is to create and maintain "Summary tables". This works nicely for 'most' DW applications.

"Very large BTrees don't work"? Bull. A million-row index will have a BTree depth of about 3. A trillion rows -- about 6. Where's the "won't work"? A "point query" on a trillion row table will touch twice as many nodes in the BTree, and more of them are unlikely to be cached. But it "will work".

Infobright, with its "columnar storage", has its niche. TokuDB, with its "fractal indexing", has its niche. Neither one can say "we are better than BTrees most of the time". (Both those engines get part of their speed by compression.)

Bottom Line: Use an index. Probably a "composite" index. (More indexing tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql )

like image 39
Rick James Avatar answered Sep 22 '25 19:09

Rick James