Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an efficient way to handle inserts of unique "immutable" entities by multiple producers with optimistic concurrency approach?

Assume a system with multiple concurrent producers that each strives to persist some graph of objects with the following common entities uniquely identifiable by their names:

CREATE TABLE CommonEntityGroup(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL
);
GO

CREATE UNIQUE INDEX IX_CommonEntityGroup_Name 
    ON CommonEntityGroup(Name)
GO


CREATE TABLE CommonEntity(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL,
    CommonEntityGroupId INT NOT NULL,
    CONSTRAINT FK_CommonEntity_CommonEntityGroup FOREIGN KEY(CommonEntityGroupId) 
        REFERENCES CommonEntityGroup(Id)
);
GO

CREATE UNIQUE INDEX IX_CommonEntity_CommonEntityGroupId_Name 
    ON CommonEntity(CommonEntityGroupId, Name)
GO

For example, producer A saves some CommonEntityMeetings, while producer B saves CommonEntitySets. Either of them has to persist CommonEntitys related to their particular items.

Basically, the key points are:

  • There are independent producers.
  • They operate concurrently.
  • Theoretically(though that may change and is not yet exactly true now) they will operate through the same Web Service (ASP.Net Web API), just with their respective endpoints/"resources". So ideally proposed solution should not rely on that.
  • They strive to persist different graphs of objects that contain possibly not yet existing CommonEntity/CommonEntityGroup objects.
  • CommonEntity/CommonEntityGroup are immutable once created and will never be modified or removed thereafter.
  • CommonEntity/CommonEntityGroup are unique according to some of their properties (Name and related common entity if any(e.g. CommonEntity is unique by CommonEntity.Name+CommonEntityGroup.Name)).
  • Producers do not know/care about IDs of those CommonEntities - they usually just pass DTOs with Names(unique) of those CommonEntities and related information. So any Common(Group)Entity has to be found/created by Name.
  • There is a certain possibility that producers will try to create the same CommonEntity/CommonEntityGroup at the same time.
  • Though it is much more likely that such CommonEntity/CommonEntityGroup objects will already exist in db.

So, with Entity Framework(database first, though it probably doesn't matter) as DAL and SQL Server as storage what is an efficient and reliable way to ensure that all those producers will successfully persist their intersecting object graphs at the same time?

Taking into account that UNIQUE INDEX already ensures that there won't be duplicate CommonEntities (Name, GroupName pair is unique) I can see the following solutions:

  1. Ensure that each CommonEntity/CommonGroupEntity is found/created+SaveChanged() before building the rest of the object's graph.

In such a case when SaveChanges is called for related entities there won't be any index violations due to other producers creating the same entities a moment before.

To achieve it I will have some

public class CommonEntityGroupRepository // sort of
{
    public CommonEntityGroupRepository(EntitiesDbContext db) ...

    // CommonEntityRepository will use this class/method internally to create parent CommonEntityGroup.
    public CommonEntityGroup FindOrCreateAndSave(String groupName)
    {
        return
            this.TryFind(groupName) ?? // db.FirstOrDefault(...)
            this.CreateAndSave(groupName);
    }

    private CommonEntityGroup CreateAndSave(String groupName)
    {
        var group = this.Db.CommonEntityGroups.Create();
        group.Name = groupName;
        this.Db.CommonGroups.Add(group)

        try
        {
            this.Db.SaveChanges();
            return group;
        }
        catch (DbUpdateException dbExc)
        {
            // Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)
            return this.Find(groupName); // TryFind that throws exception.
        }
    }
}

With this approach there will be multiple calls to SaveChanges and each CommonEntity will have its own sort of a Repository, though it seems to be the most reliable solution.

  1. Just create the entire graph and rebuild it from scratch if Index violations occur

A bit ugly and inefficient (with 10 CommonEntities we may have to retry it 10 times), but simple and more or less reliable.

  1. Just create the entire graph and replace duplicate entries if Index violations occur

Not sure that there is an easy and reliable way to replace duplicate entries in more or less complex object graphs, though both case specific and more generic reflection-based solution can be implemented.

Still, like a previous solution it may require multiple retries.

  1. Try to move this logic to database (SP)

Doubt that it will be any easier to handle inside stored procedure. It will be the same optimistic or pessimistic approaches just implemented on database side.

Though it may provide better performance(not an issue in this case) and put the insertion logic into one common place.

  1. Using SERIALIZABLE isolation level/TABLOCKX+SERIALIZABLE table hint in Stored Procedure - it should definitely work, but I'd prefer not to lock the tables exclusively more than it is actually necessary, because actual race is quite rare. And as it is already mentioned in the title, I'd like to find some optimistic concurrency approach.

I would probably try the first solution, but perhaps there are better alternatives or some potential pitfalls.

like image 272
Eugene Podskal Avatar asked Dec 26 '16 20:12

Eugene Podskal


4 Answers

Table Valued Parameters

One option is to use table valued parameters instead of individual calls to the database.

Example procedure using a table valued parameter:

create type dbo.CommonEntity_udt as table (
    CommonEntityGroupId int not null
  , Name      nvarchar(100) not null
  , primary key (CommonEntityGroupId,Name)
    );
go

create procedure dbo.CommonEntity_set (
    @CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
  set nocount on;
  set xact_abort on;
  if exists (
    select 1 
      from @CommonEntity as s
        where not exists (
          select 1 
            from dbo.CommonEntity as t
            where s.Name = t.Name
              and s.CommonEntityGroupId = t.CommonEntityGroupId
            ))
    begin;
      insert dbo.CommonEntity (Name)
        select s.Name
          from @CommonEntity as s
          where not exists (
            select 1 
              from dbo.CommonEntity as t with (updlock, holdlock)
              where s.Name = t.Name
                and s.CommonEntityGroupId = t.CommonEntityGroupId
              );
    end;
end;
go
  • Testing an Insert for Race Conditions with Ostress.exe - Kendra Little

table valued parameter reference:

  • SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven! - Leonard Lobel
  • Using Table Valued Parameters in Entity Framework - Ritesh Sharma
  • Using Table-Valued Parameters in SQL Server and .NET - Erland Sommarskog
  • how to use TVPs with Entity Framework 4.1 and CodeFirst
  • Maximizing Performance with Table-Valued Parameters - Dan Guzman

I don't recommend merge unless there is a compelling argument for it. This situation is only looking at inserting, so it seems like overkill.

Example merge version with table valued parameter:

create procedure dbo.CommonEntity_merge (
    @CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
  set nocount on;
  set xact_abort on;
  if exists (
    select 1 
      from @CommonEntity as s
        where not exists (
          select 1 
            from dbo.CommonEntity as t
            where s.Name = t.Name
              and s.CommonEntityGroupId = t.CommonEntityGroupId
            ))
    begin;
      merge dbo.CommonEntity with (holdlock) as t
      using (select CommonEntityGroupId, Name from @CommonEntity) as s
      on (t.Name = s.Name
        and s.CommonEntityGroupId = t.CommonEntityGroupId)
      when not matched by target
        then insert (CommonEntityGroupId, Name) 
        values (s.CommonEntityGroupId, s.Name);
    end;
end;
go

merge reference:

  • Use Caution with SQL Server's MERGE Statement - Aaron Bertrand
  • The Case of the Blocking Merge Statement (LCK_M_RS_U locks) - Kendra Little
  • UPSERT Race Condition With Merge - sqlteam
  • An Interesting MERGE Bug - Paul White
  • Can I optimize this merge statement - Aaron Bertrand
  • If you are using indexed views and MERGE, please read this! - Aaron Bertrand


ignore_dup_key code comment:

// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)

ignore_dup_key is going to use serializable behind the the scenes; potentially costly overhead on non-clustered indexes; and even when the index is clustered, can have significant costs depending on the amount of duplicates.

This can be handled in the stored procedures using Sam Saffron's upsert (update/insert) pattern, or one of the patterns shown here: Performance impact of different error handling techniques - Aaron Bertrand.


like image 181
SqlZim Avatar answered Nov 13 '22 00:11

SqlZim


Based on your last key point another solution is to move you "Creation" logic to a central application server/service (See update 2) that has a queue users can use to "add" records.

Since most of your records already exist,if you use some sort of caching, you should be able to make this quite efficient

Now,about the number a records.
You have to keep in mind the EF was not designed to support "bulk" operations therefore,creating thousands of records will be (really really) slow.

I have used 2 solutions that help you and a huge number of records very fast 1)EntityFramework.BulkInsert
2)SqlBulkCopy

Both are extremely easy to use

Also,I hope you've already seen Fastest Way of Inserting in Entity Framework

Update
Below is another solution that I've used twice recently
Instead of saving your record when a user performs a "Save",schedule it to happen X seconds later.
If in the meantime someone else trying to save the same record,just "slide" the Scheduled Date.

Below you can see a sample code that tries to save the same record 10 times(at the same time) but the actual save only happens once.

The actual result can be seen here:

enter image description here

using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;

namespace ConsoleApplicationScheduler
{
    class Program
    {
        static void Main(string[] args)
        {
            ConcurrentSaveService service = new ConcurrentSaveService();
            int entity = 1;
            for (int i = 0; i < 10; i++)
            {
                //Save the same record 10 times(this could be conrurrent)
                service.BeginSave(entity);
            }

            Console.ReadLine();
        }
    }

    public class ConcurrentSaveService
    {
        private static readonly ConcurrentDictionary<int, DateTime> _trackedSubjectsDictionary = new ConcurrentDictionary<int, DateTime>();

        private readonly int _delayInSeconds;

        public ConcurrentSaveService()
        {
            _delayInSeconds = 5;
        }

        public async void BeginSave(int key)
        {
            Console.WriteLine("Started Saving");
            DateTime existingTaskDate;
            _trackedSubjectsDictionary.TryGetValue(key, out existingTaskDate);

            DateTime scheduledDate = DateTime.Now.AddSeconds(_delayInSeconds);
            _trackedSubjectsDictionary.AddOrUpdate(key, scheduledDate, (i, d) => scheduledDate);

            if (existingTaskDate > DateTime.Now)
                return;

            do
            {
                await Task.Delay(TimeSpan.FromSeconds(_delayInSeconds));

                DateTime loadedScheduledDate;
                _trackedSubjectsDictionary.TryGetValue(key, out loadedScheduledDate);
                if (loadedScheduledDate > DateTime.Now)
                    continue;

                if (loadedScheduledDate == DateTime.MinValue)
                    break;

                _trackedSubjectsDictionary.TryRemove(key, out loadedScheduledDate);

                if (loadedScheduledDate > DateTime.MinValue)
                {
                    //DoWork
                    Console.WriteLine("Update/Insert record:" + key);
                }

                break;
            } while (true);

            Console.WriteLine("Finished Saving");
        }
    }
}

Update 2 Since you can control the "creation" process in your WebAPI app you should be able to avoid duplicate using some sort of cache like in the following pseudocode

using System.Collections.Concurrent;
using System.Web.Http;

namespace WebApplication2.Controllers
{
    public class ValuesController : ApiController
    {
        static object _lock = new object();
        static ConcurrentDictionary<string, object> cache = new ConcurrentDictionary<string, object>();
        public object Post(InputModel value)
        {
            var existing = cache[value.Name];
            if (existing != null)
                return new object();//Your saved record

            lock (_lock)
            {
                existing = cache[value.Name];
                if (existing != null)
                    return new object();//Your saved record

                object newRecord = new object();//Save your Object

                cache.AddOrUpdate(value.Name, newRecord, (s, o) => newRecord);

                return newRecord;
            }
        }
    }

    public class InputModel
    {
        public string Name;
    }
}
like image 25
George Vovos Avatar answered Nov 13 '22 00:11

George Vovos


Producers do not know/care about IDs of those CommonEntities - they usually just pass DTOs with Names(unique) of those CommonEntities and related information. So any Common(Group)Entity has to be found/created by Name.

I assume that tables that store your objects reference CommonEntity by their ID, not Name.

I assume that the object's table definition looks something like this:

CREATE TABLE SomeObject(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    ObjectName NVARCHAR(100) NOT NULL,
    CommonEntityId INT NOT NULL,
    CONSTRAINT FK_SomeObject_CommonEntity FOREIGN KEY(CommonEntityId) 
        REFERENCES CommonEntity(Id)
);

At the same time, the high-level SaveSomeObject function has CommonEntity.Name and CommonEntityGroup.Name (not ID) as parameters. It means that somewhere the function has to look up the entity's Name and find its corresponding ID.

So, the high-level SaveSomeObject function with parameters (ObjectName, CommonEntityName, CommonEntityGroupName) can be implemented as two steps:

CommonEntityID = GetCommonEntityID(CommonEntityName, CommonEntityGroupName);
SaveSomeObject(ObjectName, CommonEntityID);

GetCommonEntityID is a helper function/stored procedure that looks up entity's ID by its Name and creates an entity (generates an ID) if needed.

Here we explicitly extract this step into a separate dedicated function. Only this function has to deal with concurrency issues. It can be implemented using optimistic concurrency approach or pessimistic. The user of this function doesn't care what magic it uses to return valid ID, but the user can be sure that he can safely use returned ID for persisting the rest of the object.


Pessimistic concurrency approach

Pessimistic concurrency approach is simple. Make sure that only one instance of GetCommonEntityID can be run. I'd use sp_getapplock for it (instead of SERIALIZABLE transaction isolation level or table hints). sp_getapplock is essentially a mutex and once a lock is obtained we can be sure that no other instance of this stored procedure is running in parallel. This makes the logic simple - try to read the ID and INSERT the new row if it is not found.

CREATE PROCEDURE [dbo].[GetCommonEntityID]
    @ParamCommonEntityName NVARCHAR(100),
    @ParamCommonEntityGroupName NVARCHAR(100),
    @ParamCommonEntityID int OUTPUT
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

    BEGIN TRANSACTION;
    BEGIN TRY

        SET @ParamCommonEntityID = NULL;
        DECLARE @VarCommonEntityGroupID int = NULL;

        DECLARE @VarLockResult int;
        EXEC @VarLockResult = sp_getapplock
            @Resource = 'GetCommonEntityID_app_lock',
            @LockMode = 'Exclusive',
            @LockOwner = 'Transaction',
            @LockTimeout = 60000,
            @DbPrincipal = 'public';

        IF @VarLockResult >= 0
        BEGIN
            -- Acquired the lock

            SELECT @VarCommonEntityGroupID = ID
            FROM CommonEntityGroup
            WHERE Name = @ParamCommonEntityGroupName;

            IF @VarCommonEntityGroupID IS NULL
            BEGIN
                -- Such name doesn't exist, create it.
                INSERT INTO CommonEntityGroup (Name)
                VALUES (@ParamCommonEntityGroupName);

                SET @VarCommonEntityGroupID = SCOPE_IDENTITY();
            END;

            SELECT @ParamCommonEntityID = ID
            FROM CommonEntity
            WHERE
                Name = @ParamCommonEntityName
                AND CommonEntityGroupId = @VarCommonEntityGroupID
            ;

            IF @ParamCommonEntityID IS NULL
            BEGIN
                -- Such name doesn't exist, create it.
                INSERT INTO CommonEntity
                    (Name
                    ,CommonEntityGroupId)
                VALUES
                    (@ParamCommonEntityName
                    ,@VarCommonEntityGroupID);

                SET @ParamCommonEntityID = SCOPE_IDENTITY();
            END;

        END ELSE BEGIN
            -- TODO: process the error. Retry
        END;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        ROLLBACK TRANSACTION;
            -- TODO: process the error. Retry?
    END CATCH;

END

Optimistic concurrency approach

Do not try to lock anything. Act optimistically and look up the ID. If not found, try to INSERT the new value and retry if there is a unique index violation.

CREATE PROCEDURE [dbo].[GetCommonEntityID]
    @ParamCommonEntityName NVARCHAR(100),
    @ParamCommonEntityGroupName NVARCHAR(100),
    @ParamCommonEntityID int OUTPUT
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

    SET @ParamCommonEntityID = NULL;
    DECLARE @VarCommonEntityGroupID int = NULL;

    SELECT @VarCommonEntityGroupID = ID
    FROM CommonEntityGroup
    WHERE Name = @ParamCommonEntityGroupName;

    WHILE @VarCommonEntityGroupID IS NULL
    BEGIN
        -- Such name doesn't exist, create it.
        BEGIN TRANSACTION;
        BEGIN TRY

            INSERT INTO CommonEntityGroup (Name)
            VALUES (@ParamCommonEntityGroupName);

            SET @VarCommonEntityGroupID = SCOPE_IDENTITY();

            COMMIT TRANSACTION;
        END TRY
        BEGIN CATCH
            ROLLBACK TRANSACTION;
            -- TODO: Use ERROR_NUMBER() and ERROR_STATE() to check that
            -- error is indeed due to unique index violation and retry
        END CATCH;

        SELECT @VarCommonEntityGroupID = ID
        FROM CommonEntityGroup
        WHERE Name = @ParamCommonEntityGroupName;

    END;


    SELECT @ParamCommonEntityID = ID
    FROM CommonEntity
    WHERE
        Name = @ParamCommonEntityName
        AND CommonEntityGroupId = @VarCommonEntityGroupID
    ;

    WHILE @ParamCommonEntityID IS NULL
    BEGIN
        -- Such name doesn't exist, create it.
        BEGIN TRANSACTION;
        BEGIN TRY

            INSERT INTO CommonEntity
                (Name
                ,CommonEntityGroupId)
            VALUES
                (@ParamCommonEntityName
                ,@VarCommonEntityGroupID);

            SET @ParamCommonEntityID = SCOPE_IDENTITY();

            COMMIT TRANSACTION;
        END TRY
        BEGIN CATCH
            ROLLBACK TRANSACTION;
            -- TODO: Use ERROR_NUMBER() and ERROR_STATE() to check that
            -- error is indeed due to unique index violation and retry
        END CATCH;

        SELECT @ParamCommonEntityID = ID
        FROM CommonEntity
        WHERE
            Name = @ParamCommonEntityName
            AND CommonEntityGroupId = @VarCommonEntityGroupID
        ;

    END;

END

In both approaches you should have retry logic. Optimistic approach is generally better when you expect to have the Names already in the entity table and likelihood of retries is low (as in your case described in the question). Pessimistic approach is generally better when you expect to have a lot of competing processes that will try to insert the same Name. You are likely to be better off if you serialise inserts.

like image 2
Vladimir Baranov Avatar answered Nov 12 '22 23:11

Vladimir Baranov


Well choosing the approach will certainly depend on the type of functionality & amount of data that both procedures will be using.

If we go with the first approach, then certainly for each of the SaveChanges() call, the Entity Framework will put a transaction. This could reduce the performance a bit in case of large number of records.

If there is a considerable amount of records which needs to be inserted/updated, then I will surely go with the Stored procedure based approach. With this approach, you will have a full control on the database & querying for the record to check if it exists will be very easy (though some fine tuning may be required here). I don't see if there would be any challenges implementing the same with stored procedures. With few implementation optimizations like loading the data into temporary tables (not SQL temp tables, but physical tables which can be used to store data temporarily), this can be further enhanced to have full information log the stored procedure has processed.

like image 2
Mads... Avatar answered Nov 12 '22 22:11

Mads...