Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrently retrieve (select) or create (insert) new row in generic SQL without conflicts

I have a system which has a complex primary key for interfacing with external systems, and a fast, small opaque primary key for internal use. For example: the external key might be a compound value - something like (given name (varchar), family name (varchar), zip code (char)) and the internal key would be an integer ("customer ID").

When I receive an incoming request with the external key, I need to look up the internal key - and here's the tricky part - allocate a new internal key if I don't already have one for the given external ID.

Obviously if I have only one client talking to the database at a time, this is fine. SELECT customer_id FROM customers WHERE given_name = 'foo' AND ..., then INSERT INTO customers VALUES (...) if I don't find a value. But, if there are potentially many requests coming in from external systems concurrently, and many may arrive for a previously unheard-of customer all at once, there is a race condition where multiple clients may try to INSERT the new row.

If I were modifying an existing row, that would be easy; simply SELECT FOR UPDATE first, to acquire the appropriate row-level lock, before doing an UPDATE. But in this case, I don't have a row that I can lock, because the row doesn't exist yet!

I've come up with several solutions so far, but each of them has some pretty significant issues:

  1. Catch the error on INSERT, re-try the entire transaction from the top. This is a problem if the transaction involves a dozen customers, especially if the incoming data is potentially talking about the same customers in a different order each time. It's possible to get stuck in mutually recursive deadlock loops, where the conflict occurs on a different customer each time. You can mitigate this with an exponential wait time between re-try attempts, but this is a slow and expensive way to deal with conflicts. Also, this complicates the application code quite a bit as everything needs to be restartable.
  2. Use savepoints. Start a savepoint before the SELECT, catch the error on INSERT, and then roll back to the savepoint and SELECT again. Savepoints aren't completely portable, and their semantics and capabilities differ slightly and subtly between databases; the biggest difference I've noticed is that, sometimes they seem to nest and sometimes they don't, so it would be nice if I could avoid them. This is only a vague impression though - is it inaccurate? Are savepoints standardized, or at least practically consistent? Also, savepoints make it difficult to do things in parallel on the same transaction, because you might not be able to tell exactly how much work you'll be rolling back, although I realize I might just need to live with that.
  3. Acquire some global lock, like a table-level lock using a LOCK statement (oracle mysql postgres). This obviously slows down these operations and results in a lot of lock contention, so I'd prefer to avoid it.
  4. Acquire a more fine-grained, but database-specific lock. I'm only familiar with Postgres's way of doing this, which is very definitely not supported in other databases (the functions even start with "pg_") so again it's a portability issue. Also, postgres's way of doing this would require me to convert the key into a pair of integers somehow, which it may not neatly fit into. Is there a nicer way to acquire locks for hypothetical objects?

It seems to me that this has got to be a common concurrency problem with databases but I haven't managed to find a lot of resources on it; possibly just because I don't know the canonical phrasing. Is it possible to do this with some simple extra bit of syntax, in any of the tagged databases?

like image 279
Glyph Avatar asked Mar 08 '11 00:03

Glyph


2 Answers

I'm not clear on why you can't use INSERT IGNORE, which will run without error and you can check if an insert occurred (modified records). If the insert "fails", then you know the key already exists and you can do a SELECT. You could do the INSERT first, then the SELECT.

Alternatively, if you are using MySQL, use InnoDB which supports transactions. That would make it easier to rollback.

like image 93
Brent Baisley Avatar answered Oct 21 '22 11:10

Brent Baisley


Perform each customer's "lookup or maybe create" operations in autocommit mode, prior to and outside of the main, multi-customer transaction.

like image 42
pilcrow Avatar answered Oct 21 '22 13:10

pilcrow