I've been reading other questions on how to implement if-exists-insert-else-update semantics in EF, but either I'm not understanding how the answers work, or they are in fact not addressing the issue. A common solution offered is to wrap the work in a transaction scope (eg: Implementing if-not-exists-insert using Entity Framework without race conditions):
using (var scope = new TransactionScope()) // default isolation level is serializable using(var context = new MyEntities()) { var user = context.Users.SingleOrDefault(u => u.Id == userId); // * if (user != null) { // update the user user.property = newProperty; context.SaveChanges(); } else { user = new User { // etc }; context.Users.AddObject(user); context.SaveChanges(); } }
But I fail to see how this solves anything, as for this to work, the line I have starred above should block if a second thread tries to access the same user ID, unblocking only when the first thread has finished its work. Using a transaction will not cause this however, and we'll get an UpdateException thrown due to the key violation that occurs when the second thread attempts to create the same user for a second time.
Instead of catching the exception caused by the race condition, it would be better to prevent the race condition from happening in the first place. One way to do this would be for the starred line to take out an exclusive lock on the database row that matches its condition, meaning that in the context of this block, only one thread at a time could work with a user.
It seems that this must be a common problem for users of the EF, so I'm looking for a clean, generic solution that I can use everywhere.
I'd really like to avoid using a stored procedure to create my user if possible.
Any ideas?
EDIT: I tried executing the above code concurrently on two different threads using the same user ID, and despite taking out serializable transactions, they were both able to enter the critical section (*) concurrently. This lead to an UpdateException being thrown when the second thread attempted to insert the same user ID that the first had just inserted. This is because, as pointed out by Ladislav below, a serializable transaction takes exclusive locks only after it has begun modifying data, not reading.
To avoid race conditions, any operation on a shared resource – that is, on a resource that can be shared between threads – must be executed atomically. One way to achieve atomicity is by using critical sections — mutually exclusive parts of the program.
How can we do pessimistic locking? We can do pessimistic locking by specifying "IsolationLevel" in SQL Server stored procedures, ADO.NET level, or by using transaction scope object.
When using serializable transaction SQL Server issues shared locks on read records / tables. Shared locks doesn't allow other transactions modifying locked data (transactions will block) but it allows other transactions reading data before the transaction which issued locks start modifying data. That is the reason why the example doesn't work - concurrent reads are allowed with shared locks until the first transaction starts modifying data.
You want isolation where select command locks the whole table exclusively for a single client. It must lock the whole table because otherwise it will not solve concurrency for inserting "the same" record. Granular control for locking records or tables by select commands is possible when using hints but you must write direct SQL queries to use them - EF has no support for that. I described approach for exclusively locking that table here but it is like creating sequential access to the table and it affects all other clients accessing this table.
If you are really sure that this operation happens just in your single method and there are not other applications using your database you can simply place the code into critical section (.NET synchronization for example with lock
) and ensure on the .NET side that only single thread can access critical section. That is not so reliable solution but any playing with locks and transaction levels has a big impact on the database performance and throughput. You can combine this approach with optimistic concurrency (unique constraints, timestamps, etc).
Just to add my way, not that it really deals with the annoyance of exceptions being thrown and transactions not quite cutting it as a scalable solution but it does avoid race conditions from causing problems where lock type solutions are not possible (easily managed) such as in distributed systems.
I very simply use the exception and try the insert first. I use a modification of your original code as an example:
using(var context = new MyEntities()) { EntityEntry entityUser = null; try { user = new User { // etc }; entityUser = context.Users.Add(user); context.SaveChanges(); // Will throw if the entity already exists } catch (DbUpdateException x) when (x.InnerException != null && x.InnerException.Message.StartsWith("Cannot insert duplicate key row in object")) { if (entityUser != null) { // Detach the entity to stop it hanging around on the context entityUser.State = EntityState.Detached; } var user = context.Users.Find(userId); if (user != null) // just in case someone deleted it in the mean time { // update the user user.property = newProperty; context.SaveChanges(); } } }
It's not pretty, but it works and might be of use to someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With