Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize a 'col = col + 1' UPDATE query that runs on 100,000+ records?

See this previous question for some background. I'm trying to renumber a corrupted MPTT tree using SQL. The script is working fine logically, it is just much too slow.

I repeatedly need to execute these two queries:

UPDATE `tree`
SET    `rght` = `rght` + 2
WHERE  `rght` > currentLeft;

UPDATE `tree`
SET    `lft` = `lft` + 2
WHERE  `lft` > currentLeft;

The table is defined as such:

CREATE TABLE `tree` (

  `id`        char(36) NOT NULL DEFAULT '',
  `parent_id` char(36) DEFAULT NULL,
  `lft`       int(11) unsigned DEFAULT NULL,
  `rght`      int(11) unsigned DEFAULT NULL,
  ... (a couple of more columns) ...,

  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`),
  KEY `lft` (`lft`),
  KEY `rght` (`rght`),
  ... (a few more indexes) ...

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The database is MySQL 5.1.37. There are currently ~120,000 records in the table. Each of the two UPDATE queries takes roughly 15 - 20 seconds to execute. The WHERE condition may apply to a majority of the records, so that almost all records need to be updated each time. In the worst case both queries are executed as many times as there are records in the database.

Is there a way to optimize this query by keeping the values in memory, delaying writing to disk, delaying index updates or something along these lines? The bottleneck seems to be hard disk throughput right now, as MySQL seems to be writing everything back to disk immediately.

Any suggestion appreciated.

like image 349
deceze Avatar asked Sep 06 '10 04:09

deceze


2 Answers

I never used it, but if your have enough memory, try the memory table.

Create a table with the same structure as tree, insert into .. select from .., run your scripts against the memory table, and write it back.

like image 91
andrem Avatar answered Sep 23 '22 17:09

andrem


Expanding on some ideas from comment as requested:

The default is to flush to disk after every commit. You can wrap multiple updates in a commit or change this parameter:

http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

The isolation level is simple to change. Just make sure the level fits your design. This probably won't help because a range update is being used. It's nice to know though when looking for some more concurrency:

http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html

Ultimately, after noticing the range update in the query, your best bet is the MEMORY table that andrem pointed out. Also, you'll probably be able to find some performance by using a btree indexes instead of the default of hash:

http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/

like image 35
Rob Olmos Avatar answered Sep 23 '22 17:09

Rob Olmos