Assume a system with multiple concurrent producers that each strives to persist some graph of objects with the following common entities uniquely identifiable by their names:
CREATE TABLE CommonEntityGroup(
Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
Name NVARCHAR(100) NOT NULL
);
GO
CREATE UNIQUE INDEX IX_CommonEntityGroup_Name
ON CommonEntityGroup(Name)
GO
CREATE TABLE CommonEntity(
Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
Name NVARCHAR(100) NOT NULL,
CommonEntityGroupId INT NOT NULL,
CONSTRAINT FK_CommonEntity_CommonEntityGroup FOREIGN KEY(CommonEntityGroupId)
REFERENCES CommonEntityGroup(Id)
);
GO
CREATE UNIQUE INDEX IX_CommonEntity_CommonEntityGroupId_Name
ON CommonEntity(CommonEntityGroupId, Name)
GO
For example, producer A saves some CommonEntityMeeting
s, while producer B saves CommonEntitySet
s. Either of them has to persist CommonEntity
s related to their particular items.
Basically, the key points are:
Name
and related common entity if any(e.g. CommonEntity
is unique by CommonEntity.Name
+CommonEntityGroup.Name
)).CommonEntities
- they usually just pass DTOs with Names
(unique) of those CommonEntities
and related information. So any Common(Group)Entity
has to be found/created by Name
.So, with Entity Framework(database first, though it probably doesn't matter) as DAL and SQL Server as storage what is an efficient and reliable way to ensure that all those producers will successfully persist their intersecting object graphs at the same time?
Taking into account that UNIQUE INDEX
already ensures that there won't be duplicate CommonEntities
(Name, GroupName pair is unique) I can see the following solutions:
In such a case when SaveChanges
is called for related entities there won't be any index violations due to other producers creating the same entities a moment before.
To achieve it I will have some
public class CommonEntityGroupRepository // sort of
{
public CommonEntityGroupRepository(EntitiesDbContext db) ...
// CommonEntityRepository will use this class/method internally to create parent CommonEntityGroup.
public CommonEntityGroup FindOrCreateAndSave(String groupName)
{
return
this.TryFind(groupName) ?? // db.FirstOrDefault(...)
this.CreateAndSave(groupName);
}
private CommonEntityGroup CreateAndSave(String groupName)
{
var group = this.Db.CommonEntityGroups.Create();
group.Name = groupName;
this.Db.CommonGroups.Add(group)
try
{
this.Db.SaveChanges();
return group;
}
catch (DbUpdateException dbExc)
{
// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)
return this.Find(groupName); // TryFind that throws exception.
}
}
}
With this approach there will be multiple calls to SaveChanges and each CommonEntity will have its own sort of a Repository, though it seems to be the most reliable solution.
A bit ugly and inefficient (with 10 CommonEntities
we may have to retry it 10 times), but simple and more or less reliable.
Not sure that there is an easy and reliable way to replace duplicate entries in more or less complex object graphs, though both case specific and more generic reflection-based solution can be implemented.
Still, like a previous solution it may require multiple retries.
Doubt that it will be any easier to handle inside stored procedure. It will be the same optimistic or pessimistic approaches just implemented on database side.
Though it may provide better performance(not an issue in this case) and put the insertion logic into one common place.
I would probably try the first solution, but perhaps there are better alternatives or some potential pitfalls.
Table Valued Parameters
One option is to use table valued parameters
instead of individual calls to the database.
Example procedure using a table valued parameter:
create type dbo.CommonEntity_udt as table (
CommonEntityGroupId int not null
, Name nvarchar(100) not null
, primary key (CommonEntityGroupId,Name)
);
go
create procedure dbo.CommonEntity_set (
@CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
set nocount on;
set xact_abort on;
if exists (
select 1
from @CommonEntity as s
where not exists (
select 1
from dbo.CommonEntity as t
where s.Name = t.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId
))
begin;
insert dbo.CommonEntity (Name)
select s.Name
from @CommonEntity as s
where not exists (
select 1
from dbo.CommonEntity as t with (updlock, holdlock)
where s.Name = t.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId
);
end;
end;
go
table valued parameter reference:
I don't recommend merge
unless there is a compelling argument for it. This situation is only looking at inserting, so it seems like overkill.
Example merge
version with table valued parameter:
create procedure dbo.CommonEntity_merge (
@CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
set nocount on;
set xact_abort on;
if exists (
select 1
from @CommonEntity as s
where not exists (
select 1
from dbo.CommonEntity as t
where s.Name = t.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId
))
begin;
merge dbo.CommonEntity with (holdlock) as t
using (select CommonEntityGroupId, Name from @CommonEntity) as s
on (t.Name = s.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId)
when not matched by target
then insert (CommonEntityGroupId, Name)
values (s.CommonEntityGroupId, s.Name);
end;
end;
go
merge
reference:
MERGE
Statement - Aaron Bertrandignore_dup_key
code comment:
// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)
ignore_dup_key
is going to use serializable
behind the the scenes; potentially costly overhead on non-clustered indexes; and even when the index is clustered, can have significant costs depending on the amount of duplicates.
This can be handled in the stored procedures using Sam Saffron's upsert (update/insert) pattern, or one of the patterns shown here: Performance impact of different error handling techniques - Aaron Bertrand.
Based on your last key point another solution is to move you "Creation" logic to a central application server/service (See update 2) that has a queue users can use to "add" records.
Since most of your records already exist,if you use some sort of caching, you should be able to make this quite efficient
Now,about the number a records.
You have to keep in mind the EF was not designed to support "bulk" operations therefore,creating thousands of records will be (really really) slow.
I have used 2 solutions that help you and a huge number of records very fast
1)EntityFramework.BulkInsert
2)SqlBulkCopy
Both are extremely easy to use
Also,I hope you've already seen Fastest Way of Inserting in Entity Framework
Update
Below is another solution that I've used twice recently
Instead of saving your record when a user performs a "Save",schedule it to happen X seconds later.
If in the meantime someone else trying to save the same record,just "slide" the Scheduled Date.
Below you can see a sample code that tries to save the same record 10 times(at the same time) but the actual save only happens once.
The actual result can be seen here:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace ConsoleApplicationScheduler
{
class Program
{
static void Main(string[] args)
{
ConcurrentSaveService service = new ConcurrentSaveService();
int entity = 1;
for (int i = 0; i < 10; i++)
{
//Save the same record 10 times(this could be conrurrent)
service.BeginSave(entity);
}
Console.ReadLine();
}
}
public class ConcurrentSaveService
{
private static readonly ConcurrentDictionary<int, DateTime> _trackedSubjectsDictionary = new ConcurrentDictionary<int, DateTime>();
private readonly int _delayInSeconds;
public ConcurrentSaveService()
{
_delayInSeconds = 5;
}
public async void BeginSave(int key)
{
Console.WriteLine("Started Saving");
DateTime existingTaskDate;
_trackedSubjectsDictionary.TryGetValue(key, out existingTaskDate);
DateTime scheduledDate = DateTime.Now.AddSeconds(_delayInSeconds);
_trackedSubjectsDictionary.AddOrUpdate(key, scheduledDate, (i, d) => scheduledDate);
if (existingTaskDate > DateTime.Now)
return;
do
{
await Task.Delay(TimeSpan.FromSeconds(_delayInSeconds));
DateTime loadedScheduledDate;
_trackedSubjectsDictionary.TryGetValue(key, out loadedScheduledDate);
if (loadedScheduledDate > DateTime.Now)
continue;
if (loadedScheduledDate == DateTime.MinValue)
break;
_trackedSubjectsDictionary.TryRemove(key, out loadedScheduledDate);
if (loadedScheduledDate > DateTime.MinValue)
{
//DoWork
Console.WriteLine("Update/Insert record:" + key);
}
break;
} while (true);
Console.WriteLine("Finished Saving");
}
}
}
Update 2 Since you can control the "creation" process in your WebAPI app you should be able to avoid duplicate using some sort of cache like in the following pseudocode
using System.Collections.Concurrent;
using System.Web.Http;
namespace WebApplication2.Controllers
{
public class ValuesController : ApiController
{
static object _lock = new object();
static ConcurrentDictionary<string, object> cache = new ConcurrentDictionary<string, object>();
public object Post(InputModel value)
{
var existing = cache[value.Name];
if (existing != null)
return new object();//Your saved record
lock (_lock)
{
existing = cache[value.Name];
if (existing != null)
return new object();//Your saved record
object newRecord = new object();//Save your Object
cache.AddOrUpdate(value.Name, newRecord, (s, o) => newRecord);
return newRecord;
}
}
}
public class InputModel
{
public string Name;
}
}
Producers do not know/care about IDs of those CommonEntities - they usually just pass DTOs with Names(unique) of those CommonEntities and related information. So any Common(Group)Entity has to be found/created by Name.
I assume that tables that store your objects reference CommonEntity
by their ID
, not Name
.
I assume that the object's table definition looks something like this:
CREATE TABLE SomeObject(
Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
ObjectName NVARCHAR(100) NOT NULL,
CommonEntityId INT NOT NULL,
CONSTRAINT FK_SomeObject_CommonEntity FOREIGN KEY(CommonEntityId)
REFERENCES CommonEntity(Id)
);
At the same time, the high-level SaveSomeObject
function has CommonEntity.Name
and CommonEntityGroup.Name
(not ID
) as parameters. It means that somewhere the function has to look up the entity's Name
and find its corresponding ID
.
So, the high-level SaveSomeObject
function with parameters (ObjectName, CommonEntityName, CommonEntityGroupName)
can be implemented as two steps:
CommonEntityID = GetCommonEntityID(CommonEntityName, CommonEntityGroupName);
SaveSomeObject(ObjectName, CommonEntityID);
GetCommonEntityID
is a helper function/stored procedure that looks up entity's ID
by its Name
and creates an entity (generates an ID
) if needed.
Here we explicitly extract this step into a separate dedicated function. Only this function has to deal with concurrency issues. It can be implemented using optimistic concurrency approach or pessimistic. The user of this function doesn't care what magic it uses to return valid ID
, but the user can be sure that he can safely use returned ID
for persisting the rest of the object.
Pessimistic concurrency approach
Pessimistic concurrency approach is simple. Make sure that only one instance of
GetCommonEntityID
can be run. I'd use sp_getapplock
for it (instead of SERIALIZABLE transaction isolation level or table hints). sp_getapplock
is essentially a mutex and once a lock is obtained we can be sure that no other instance of this stored procedure is running in parallel. This makes the logic simple - try to read the ID
and INSERT
the new row if it is not found.
CREATE PROCEDURE [dbo].[GetCommonEntityID]
@ParamCommonEntityName NVARCHAR(100),
@ParamCommonEntityGroupName NVARCHAR(100),
@ParamCommonEntityID int OUTPUT
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRANSACTION;
BEGIN TRY
SET @ParamCommonEntityID = NULL;
DECLARE @VarCommonEntityGroupID int = NULL;
DECLARE @VarLockResult int;
EXEC @VarLockResult = sp_getapplock
@Resource = 'GetCommonEntityID_app_lock',
@LockMode = 'Exclusive',
@LockOwner = 'Transaction',
@LockTimeout = 60000,
@DbPrincipal = 'public';
IF @VarLockResult >= 0
BEGIN
-- Acquired the lock
SELECT @VarCommonEntityGroupID = ID
FROM CommonEntityGroup
WHERE Name = @ParamCommonEntityGroupName;
IF @VarCommonEntityGroupID IS NULL
BEGIN
-- Such name doesn't exist, create it.
INSERT INTO CommonEntityGroup (Name)
VALUES (@ParamCommonEntityGroupName);
SET @VarCommonEntityGroupID = SCOPE_IDENTITY();
END;
SELECT @ParamCommonEntityID = ID
FROM CommonEntity
WHERE
Name = @ParamCommonEntityName
AND CommonEntityGroupId = @VarCommonEntityGroupID
;
IF @ParamCommonEntityID IS NULL
BEGIN
-- Such name doesn't exist, create it.
INSERT INTO CommonEntity
(Name
,CommonEntityGroupId)
VALUES
(@ParamCommonEntityName
,@VarCommonEntityGroupID);
SET @ParamCommonEntityID = SCOPE_IDENTITY();
END;
END ELSE BEGIN
-- TODO: process the error. Retry
END;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
-- TODO: process the error. Retry?
END CATCH;
END
Optimistic concurrency approach
Do not try to lock anything. Act optimistically and look up the ID
. If not found, try to INSERT
the new value and retry if there is a unique index violation.
CREATE PROCEDURE [dbo].[GetCommonEntityID]
@ParamCommonEntityName NVARCHAR(100),
@ParamCommonEntityGroupName NVARCHAR(100),
@ParamCommonEntityID int OUTPUT
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
SET @ParamCommonEntityID = NULL;
DECLARE @VarCommonEntityGroupID int = NULL;
SELECT @VarCommonEntityGroupID = ID
FROM CommonEntityGroup
WHERE Name = @ParamCommonEntityGroupName;
WHILE @VarCommonEntityGroupID IS NULL
BEGIN
-- Such name doesn't exist, create it.
BEGIN TRANSACTION;
BEGIN TRY
INSERT INTO CommonEntityGroup (Name)
VALUES (@ParamCommonEntityGroupName);
SET @VarCommonEntityGroupID = SCOPE_IDENTITY();
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
-- TODO: Use ERROR_NUMBER() and ERROR_STATE() to check that
-- error is indeed due to unique index violation and retry
END CATCH;
SELECT @VarCommonEntityGroupID = ID
FROM CommonEntityGroup
WHERE Name = @ParamCommonEntityGroupName;
END;
SELECT @ParamCommonEntityID = ID
FROM CommonEntity
WHERE
Name = @ParamCommonEntityName
AND CommonEntityGroupId = @VarCommonEntityGroupID
;
WHILE @ParamCommonEntityID IS NULL
BEGIN
-- Such name doesn't exist, create it.
BEGIN TRANSACTION;
BEGIN TRY
INSERT INTO CommonEntity
(Name
,CommonEntityGroupId)
VALUES
(@ParamCommonEntityName
,@VarCommonEntityGroupID);
SET @ParamCommonEntityID = SCOPE_IDENTITY();
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
-- TODO: Use ERROR_NUMBER() and ERROR_STATE() to check that
-- error is indeed due to unique index violation and retry
END CATCH;
SELECT @ParamCommonEntityID = ID
FROM CommonEntity
WHERE
Name = @ParamCommonEntityName
AND CommonEntityGroupId = @VarCommonEntityGroupID
;
END;
END
In both approaches you should have retry logic. Optimistic approach is generally better when you expect to have the Names already in the entity table and likelihood of retries is low (as in your case described in the question). Pessimistic approach is generally better when you expect to have a lot of competing processes that will try to insert the same Name. You are likely to be better off if you serialise inserts.
Well choosing the approach will certainly depend on the type of functionality & amount of data that both procedures will be using.
If we go with the first approach, then certainly for each of the SaveChanges() call, the Entity Framework will put a transaction. This could reduce the performance a bit in case of large number of records.
If there is a considerable amount of records which needs to be inserted/updated, then I will surely go with the Stored procedure based approach. With this approach, you will have a full control on the database & querying for the record to check if it exists will be very easy (though some fine tuning may be required here). I don't see if there would be any challenges implementing the same with stored procedures. With few implementation optimizations like loading the data into temporary tables (not SQL temp tables, but physical tables which can be used to store data temporarily), this can be further enhanced to have full information log the stored procedure has processed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With