Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing MySQL replace into select using multiple DB results to deadlock

I'd checked other similar kind of issues e.g. "deadlock in MySQL" in stack overflow but nothing leads to the solution.

REPLACE INTO db2.table2 (id, some_identifier_id, name, created_at, updated_at) (SELECT id, some_identifier_id, name, created_at, updated_at FROM db1.table1 WHERE some_identifier_id IS NOT NULL AND some_identifier_id NOT IN (SELECT some_identifier_id FROM db2.table1 WHERE some_other_identifier_id IS NOT NULL));

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Situation:

  1. All the tables are InnoDB; db1.table1 => Collation: latin1_swedish_ci and for db2 => Collation: utf8_unicode_ci
  2. The query works fine in development server where version is Server version: 10.0.15-MariaDB
  3. Let's assume I have 5 DB servers which shares multi-master replication using Galera cluster.
  4. I'm executing the query manually in any of those 5 servers and getting the error.
  5. That server's version is same as the dev server where the query execution is successful i.e. 10.0.15-MariaDB

Tried:

  1. Included LOCK IN SHARE MODE e.g. REPLACE INTO... (1st select query (sub query) LOCK IN SHARE MODE); But it failed with the same message.
  2. Insert / REPLACE ... (1st select query (sub query LOCK IN SHARE MODE) LOCK IN SHARE MODE); it also failed with the same message.
  3. Tried with ordering by id in select query / sub select query. Again failed with same message.
  4. db1.table1 and db2.table1 both have almost only 50k records, so that shouldn't cause any issue I guess.
  5. All the tables have id as the primary key and auto increament. But I'm somehow using them explicitly- please observe the query.
  6. SHOW ENGINE INNODB STATUS; doesn't add up any useful hint to me.

The most probable reason could be due to multi-master replication behind the galera cluster for its optimistic locking (http://www.severalnines.com/blog/avoiding-deadlocks-galera-set-haproxy-single-node-writes-and-multi-node-reads). But that should not fail when executing the query on an individual node? Though on success I've to execute the same in that multi-master replication but I guess if the basic issue is solved then replicated servers won't create issue anymore.

Note:

I need to do this without any temp table or storing the sub query's result in code. There are some other dependencies for which executing a single query is the most favorable way till now.

like image 990
Kuntal Chandra Avatar asked Oct 19 '22 12:10

Kuntal Chandra


1 Answers

Okay, I found a workaround to this. As per my research and tests, I think there are 2 issues behind this failure.

  1. The replace into query is syncing id along with other required fields from db1.table1 to db2.table2. Insert/ Replace auto-incremental primary key is the most probable and obvious reason of deadlock in galera. I have removed id from that query and kept some_identifier_id as the unique key to support the same replace query. And it stopped that deadlock error almost.

Do not rely on auto-increment values to be sequential. Galera uses a mechanism based on autoincrement increment to produce unique non-conflicting sequences, so on every single node the sequence will have gaps. https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/

  1. But still the same deadlock message comes 1/10 times and that is a known behaviour of Galera. Galera uses optimistic locking; leads to deadlock rarely; retrying the transaction again is most suggested in that case.

Galera Cluster uses at the cluster-level optimistic concurrency control, which can result in transactions that issue a COMMIT aborting at that stage. http://galeracluster.com/documentation-webpages/limitations.html

In a gist- query was running successfully in an individual server but when it's galera then the failure comes. Removal of the auto-incremental primary key from that query and handling the same transaction to restart on deadlock solved the problem.

[Edit]

  1. I've written a post to explain the schema, environment, issue and how I worked with it. May be useful to someone facing the same issue.

  2. The issue is reported to community and open

like image 197
Kuntal Chandra Avatar answered Oct 22 '22 01:10

Kuntal Chandra