Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this update-with-join mysql query so slow?

I have an application that needs to update nodes in a hierarchical structure, upwards from a particular node whose ID is known. I use the following MySQL statement to do this:

update node as A 
join node as B 
   on A.lft<=B.lft and A.rgt>=B.rgt 
set A.count=A.count+1 where B.id=?

The table has a primary key on id, and indexes on lft and rgt. The statement works, but I found that it had performance problems. Looking at the EXPLAIN results for the corresponding select statement, I saw that the number of rows inspected for the "B" table was very large (maybe the entire table).

I can easily pull the query apart into two separate ones:

select lft, rgt from node where id=?
LFT=result.lft
RGT=result.rgt
update node set count=count+1 where lft<=LFT and rgt>=RGT

But why does the original statement not perform as expected, and how would I need to reformulate it to work better?

By request, here's an abbreviated version of the create table:

CREATE TABLE `node` ( 
`id` int(11) NOT NULL auto_increment, 
`name` varchar(255) NOT NULL, 
`lft` decimal(64,0) NOT NULL, 
`rgt` decimal(64,0) NOT NULL, 
`count` int(11) NOT NULL default '0', 
PRIMARY KEY (`id`), 
KEY `name` (`name`), 
KEY `location` (`location`(255)), 
KEY `lft` (`lft`), 
KEY `rgt` (`rgt`), 
) ENGINE=InnoDB

I have not tried to add the composite index (actually, I don't have the access level required to do that on the spot); but I don't see how it would help, trying to think through how the database engine would try to resolve the dual inequalities.

like image 547
plantrob Avatar asked Dec 04 '22 09:12

plantrob


1 Answers

You can "force" (at least up to 5.5, version 5.6 has sevral improvements on the optimizer which may make this rewriting redundant) MySQL to evaluate first the conditions on table B by taking the first part of your split into as a subquery and then using this as a derived table and joining to table A:

UPDATE node AS a 
  JOIN 
    ( SELECT lft, rgt
      FROM node
      WHERE id = ? 
    ) AS b 
    ON  a.lft <= b.lft 
    AND a.rgt >= b.rgt
SET 
    a.count = a.count + 1 ; 

Efficiency will still depend on which of the two indexes is chosen to limit the rows to be updated. Still after the use of any of these 2 indexes, table lookups are needed to check the other column. So, I suggest you add a composite index on (lft, rgt) and one on (rgt, lft) so only one index is used to find which rows shall be updated.

I suppose that you are using the Nested Set and the efficiency of this update will not be great on a big table as the query has 2 range conditions and that limits the efficiency of B-tree indexes.

like image 175
ypercubeᵀᴹ Avatar answered Dec 14 '22 23:12

ypercubeᵀᴹ