Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework: Avoiding Inserting Duplicates

Say, I have the following conceptual model, there are strories that have tags (more than one, so it's a many-to-many relationship), plus each tag belongs to a particular category.

My data comes from an external source and before inserting it I want to make sure that no duplicated tags are added.

Updated code snippet:

static void Main(string[] args)
    {
        Story story1 = new Story();
        story1.Title = "Introducing the Entity Framework";
        story1.Tags.Add(new Tag { Name = ".net",  });
        story1.Tags.Add(new Tag { Name = "database" });

        Story story2 = new Story();
        story2.Title = "Working with Managed DirectX";
        story2.Tags.Add(new Tag { Name = ".net" });
        story2.Tags.Add(new Tag { Name = "graphics" });

        List<Story> stories = new List<Story>();
        stories.Add(story1);
        stories.Add(story2);

        EfQuestionEntities db = new EfQuestionEntities();

        Category category = (from c in db.Categories
                             where c.Name == "Programming"
                             select c).First();

        foreach (Story story in stories)
        {
            foreach (Tag tag in story.Tags)
            {
                Tag currentTag = tag;
                currentTag = GetTag(tag.Name, category, db);
            }

            db.Stories.AddObject(story);
        }

        db.SaveChanges();
    }

    public static Tag GetTag(string name, Category category, EfQuestionEntities db)
    {
        var dbTag = from t in db.Tags.Include("Category")
                    where t.Name == name
                    select t;

        if (dbTag.Count() > 0)
        {
            return dbTag.First();
        }

        var cachedTag = db.ObjectStateManager.GetObjectStateEntries(EntityState.Added).
            Where(ose => ose.EntitySet == db.Tags.EntitySet).
            Select(ose => ose.Entity).
            Cast<Tag>().Where(x => x.Name == name);

        if (cachedTag.Count() != 0) 
        {
            return cachedTag.First();
        }

        Tag tag = new Tag();
        tag.Name = name;
        tag.Category = category;

        db.Tags.AddObject(tag);

        return tag;
    }

However, I get an exception about an object with the same EntityKey that is already present in the ObjectContext.

Also, if I remove the else statement I will get an exception about violating an FK constraint, so it seems like its Category attribute is set to null.

like image 791
Mike Borozdin Avatar asked Mar 21 '11 11:03

Mike Borozdin


People also ask

What are some ways to prevent duplicate entries when making a query?

The SQL DISTINCT keyword, which we have already discussed is used in conjunction with the SELECT statement to eliminate all the duplicate records and by fetching only the unique records.

How to avoid repeated values in SQL?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.


1 Answers

I 've had the same problem with EF. Here's what I ended up doing:

  1. Instead of doing story1.Tags.Add(new Tag { Name = ".net", }) yourself, routed all Tag creation through a helper method like this: story1.Tags.Add(GetTag(".net")).
  2. The GetTag method checks the tags in the context to see if it should return an existing entity, like you do. If it does, it returns that.
  3. If there is no existing entity, it checks the ObjectStateManager to see if there are Tag entities added to the context but not already written to the db. If it finds a matching Tag, it returns that.
  4. If it still has not found the Tag, it creates a new Tag, adds it to the context, and then returns it.

In essence this will make sure that no more than one instance of any Tag (be it already existing or just created) will be used throughout your program.

Some example code lifted from my project (uses InventoryItem instead of Tag, but you get the idea).

The check in step 3 is done like this:

// Second choice: maybe it's not in the database yet, but it's awaiting insertion?
inventoryItem = context.ObjectStateManager.GetObjectStateEntries(EntityState.Added)
    .Where(ose => ose.EntitySet == context.InventoryItems.EntitySet)
    .Select(ose => ose.Entity)
    .Cast<InventoryItem>()
    .Where(equalityPredicate.Compile())
    .SingleOrDefault();

if (inventoryItem != null) {
    return inventoryItem;
}

If the Tag is not found in step 3, here's the code for step 4:

inventoryItem = new InventoryItem();
context.InventoryItems.AddObject(inventoryItem);
return inventoryItem;

Update:

It should be used like this:

Story story1 = new Story();
story1.Title = "Introducing the Entity Framework";
story1.Tags.Add(GetTag(".net", category, db));
story1.Tags.Add(GetTag("database", category, db));
like image 60
Jon Avatar answered Sep 19 '22 14:09

Jon