Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL: OPTIMIZE TABLE needed on table with fixed columns?

I have a weekly script that moves data from our live database and puts it into our archive database, then deletes the data it just archived from the live database. Since it's a decent size delete (about 10% of the table gets trimmed), I figured I should be running OPTIMIZE TABLE after this delete.

However, I'm reading this from the mysql documentation and I don't know how to interpret it: http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html

"OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file."

The first sentence is ambiguous to me. Does it mean you should run it if: A) you have deleted a large part of a table with variable-length rows or if you have made many changes to a table with variable-length rows OR B) you have deleted a large part of ANY table or if you have made many changes to a table with variable-length rows

Does that make sense? So if my table has no VAR columns, do I need to run it still?

While we're on the subject - is there any indicator that tells me that a table is ripe for an OPTIMIZE call?

Also, I read this http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/ that says running OPTIMIZE table only is useful for the primary key. If most of my selects are from other indices, am I just wasting effort on tables that have a surrogate key?

Thanks so much!

like image 394
Shane N Avatar asked Aug 02 '10 21:08

Shane N


2 Answers

In your scenario, I do not believe that regularly optimizing the table will make an appreciable difference.

First things first, your second interpretation (B) of the documentation is correct - "if you have deleted a large part of ANY table OR if you have made many changes to a table with variable-length rows."

If your table has no VAR columns, each record, regardless of the data it contains, takes up the exact same amount of space in the table. If a record is deleted from the table, and the DB chooses to reuse the exact area the previous record was stored, it can do so without wasting any space or fragmenting your data.

As far as whether OPTIMIZE only improves performance on a query that utilizes the primary key index, that answer would almost certainly vary based on what storage engine is in use, and I'm afraid I wouldn't be able to answer that.

However, speaking of storage engines, if you do end up using OPTIMIZE, be aware that it doesn't like to run on InnoDB tables, so the command maps to ALTER and rebuilds the table, which might be a more expensive operation. Either way, the table locks during the optimizations, so be very careful about when you run it.

like image 81
Ryan Tenney Avatar answered Sep 28 '22 00:09

Ryan Tenney


There are so many differences between MyISAM and InnoDB, I am splitting this answer in two:

MyISAM

  • FIXED has some meaning for MyISAM.
  • "Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions" applies to MyISAM, not InnoDB. Hence, for MyISAM tables with a lot of churn, OPTIMIZE can be beneficial.
  • In MyISAM, VAR plus DELETE/UPDATE leads to fragmentation.
  • Because of the linked list and VAR, a single row can be fragmented across the data file (.MYD). (Otherwise, a MyISAM row is contiguous in the data file.)

InnoDB

  • FIXED has no meaning for InnoDB tables.
  • For VAR in InnoDB, there are "block splits", not a linked list.
  • In a BTree, block splits stabilize at and average 69% full. So, with InnoDB, almost any abuse will leave the table not too bloated. That is, DELETE/UPDATE (with or without VAR) leads to the more limited BTree 'fragmentation'.
  • In InnoDB, emptied blocks (16KB each) are put on a "free list" for reuse; they are not given back to the OS.
  • Data in InnoDB is ordered by the PRIMARY KEY, so deleting a row in one part of the table does not provide space for a new row in another part of the table. But, when a block is freed up, it can be used elsewhere.
  • Two adjacent blocks that are half empty will be coalesced, thereby freeing up a block.

Both

  • If you are removing "old" data (your 10%), then PARTITIONing is a much better way to do it. See my blog. It involves DROP PARTITION, which is instantaneous and gives space back to the OS, plus REORGANIZE PARTITION, which can be instantaneous.
  • OPTIMIZE TABLE is almost never worth doing.
like image 24
Rick James Avatar answered Sep 28 '22 01:09

Rick James