Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing the Repository Pattern Correctly with EF Core

NOTE

I'm not asking if I should use the Repository pattern, I care about the How. Injecting persistence-related objects into domain classes is not an option for me: it makes Unit Testing impossible (and no, tests using in-memory databases are NOT Unit Tests, as they cover many different classes without isolation), it couples the domain logic with the ORM and it brakes many important principles I practice, like Persistence Ignorance, Separation of Concerns, and others, whose benefits you're welcome to search online. Using EF Core "correctly" is not nearly as important to me as keeping the business logic isolated from external concerns, which is why I'll settle for a "hacky" usage of EF Core if it means the Repository won't be a leaky abstraction anymore.

Original Question

Let's assume the repository's interface is the following:

public interface IRepository<TEntity>
    where TEntity : Entity
{
    void Add(TEntity entity);
    void Remove(TEntity entity);
    Task<TEntity?> FindByIdAsync(Guid id);
}

public abstract class Entity
{
    public Entity(Guid id)
    {
        Id = id;
    }
    public Guid Id { get; }
}

Most of the EF Core implementations I saw online did something like:

public class EFCoreRepository<TEntity> : IRepository<TEntity>
    where TEntity : Entity
{
    private readonly DbSet<TEntity> entities;

    public EFCoreRepository(DbContext dbContext)
    {
        entities = dbContext.Set<TEntity>();
    }

    public void Add(TEntity entity)
    {
        entities.Add(entity);
    }

    public void Remove(TEntity entity)
    {
        entities.Remove(entity);
    }

    public async Task<TEntity?> FindByIdAsync(Guid id)
    {
        return await entities.FirstOrDefaultAsync(e => e.Id == id);
    }
}

The changes are committed in another class, in an implementation of the Unit of Work pattern. The problem I have with this implementation is that it violates the definition of a repository as a "collection-like" object. Users of this class would have to know that the data is persisted in an external store and call the Save() method themselves. The following snippet won't work:

var entity = new ConcreteEntity(id: Guid.NewGuid());
repository.Add(entity);
var result = await repository.FindByIdAsync(entity.Id); // Will return null

The changes should obviously not be committed after every call to Add(), because it defeats the purpose of the Unit of Work, so we end up with a weird, not very collection-like interface for the repository. In my mind, we should be able to treat a repository exactly like we would treat a regular in-memory collection:

var list = new List<ConcreteEntity>();
var entity = new ConcreteEntity(id: Guid.NewGuid());
list.Add(entity);
// No need to save here
var result = list.FirstOrDefault(e => e.Id == entity.Id);

When the transaction scope ends, the changes can be committed to the DB, but apart from the low-level code that deals with the transaction, I don't want the domain logic to care about when the transaction is committed. What we can do to implement the interface in this fashion is to use the DbSet's Local collection in addition to the regular DB query. That would be:

...
public async Task<TEntity?> FindByIdAsync(Guid id)
{
    var entity = entities.Local.FirstOrDefault(e => e.Id == id);
    return entity ?? await entities.FirstOrDefaultAsync(e => e.Id == id);
}

This works, but this generic implementation would then be derived in concrete repositories with many other methods that query data. All of these queries will have to be implemented with the Local collection in mind, and I haven't found a clean way to enforce concrete repositories not to ignore local changes. So my question really boils down to:

  1. Is my interpretation of the Repository pattern correct? Why is there no mention of this problem in other implementations online? Even Microsoft's implementation (which is a bit outdated, but the idea is the same) in the official documentation website ignores local changes when querying.
  2. Is there a better solution to include local changes in EF Core than manually querying both the DB and the Local collection every time?

UPDATE - My Solution

I ended up implementing the second solution suggested by @Ronald's answer. I made the repository save the changes to the database automatically, and wrapped every request in a database transaction. One thing I changed from the proposed solution is that I called SaveChangesAsync on every read, not write. This is similar to what Hibernate already does (in Java). Here is a simplified implementation:

public abstract class EFCoreRepository<TEntity> : IRepository<TEntity>
    where TEntity : Entity
{
    private readonly DbSet<TEntity> dbSet;
    public EFCoreRepository(DbContext dbContext)
    {
        dbSet = dbContext.Set<TEntity>();
        Entities = new EntitySet<TEntity>(dbContext);
    }

    protected IQueryable<TEntity> Entities { get; }

    public void Add(TEntity entity)
    {
        dbSet.Add(entity);
    }

    public async Task<TEntity?> FindByIdAsync(Guid id)
    {
        return await Entities.SingleOrDefaultAsync(e => e.Id == id);
    }

    public void Remove(TEntity entity)
    {
        dbSet.Remove(entity);
    }
}

internal class EntitySet<TEntity> : IQueryable<TEntity>
    where TEntity : Entity
{
    private readonly DbSet<TEntity> dbSet;
    public EntitySet(DbContext dbContext)
    {
        dbSet = dbContext.Set<TEntity>();
        Provider = new AutoFlushingQueryProvider<TEntity>(dbContext);
    }

    public Type ElementType => dbSet.AsQueryable().ElementType;

    public Expression Expression => dbSet.AsQueryable().Expression;

    public IQueryProvider Provider { get; }

    // GetEnumerator() omitted...
}

internal class AutoFlushingQueryProvider<TEntity> : IAsyncQueryProvider
    where TEntity : Entity
{
    private readonly DbContext dbContext;
    private readonly IAsyncQueryProvider internalProvider;

    public AutoFlushingQueryProvider(DbContext dbContext)
    {
        this.dbContext = dbContext;
        var dbSet = dbContext.Set<TEntity>().AsQueryable();
        internalProvider = (IAsyncQueryProvider)dbSet.Provider;
    }
    public TResult ExecuteAsync<TResult>(Expression expression, CancellationToken cancellationToken = default)
    {
        var internalResultType = typeof(TResult).GenericTypeArguments.First();

        // Calls this.ExecuteAsyncCore<internalResultType>(expression, cancellationToken)
        object? result = GetType()
            .GetMethod(nameof(ExecuteAsyncCore), BindingFlags.NonPublic | BindingFlags.Instance)
            ?.MakeGenericMethod(internalResultType)
            ?.Invoke(this, new object[] { expression, cancellationToken });

        if (result is not TResult)
            throw new Exception(); // This should never happen

        return (TResult)result;
    }

    private async Task<TResult> ExecuteAsyncCore<TResult>(Expression expression, CancellationToken cancellationToken)
    {
        await dbContext.SaveChangesAsync(cancellationToken);
        return await internalProvider.ExecuteAsync<Task<TResult>>(expression, cancellationToken);
    }

    // Other interface methods omitted...
}

Notice the use of IAsyncQueryProvider, which forced me to use a small Reflection hack. This was required to support the asynchronous LINQ methods that comes with EF Core.

like image 351
Gur Galler Avatar asked Nov 22 '20 17:11

Gur Galler


People also ask

Is the repository pattern useful with Entity Framework Core?

No, the repository/unit-of-work pattern (shortened to Rep/UoW) isn't useful with EF Core. EF Core already implements a Rep/UoW pattern, so layering another Rep/UoW pattern on top of EF Core isn't helpful.

What is repository in EF core?

At the implementation level, a repository is simply a class with data persistence code coordinated by a unit of work (DBContext in EF Core) when performing updates, as shown in the following class: C# Copy.

What is repository pattern in .NET core?

Repository Pattern is an abstraction of the Data Access Layer. It hides the details of how exactly the data is saved or retrieved from the underlying data source. The details of how the data is stored and retrieved is in the respective repository.


Video Answer


3 Answers

It seems there is a misconception about Repositories and Entities here. First of all, DDD's Entity and EntityFramework's Entity are sligthly different concepts. In DDD, an Entity is basically a way of keeping track of the evolution of an business concept instance overtime, whereas in EntityFramwork, an Entity is merely a persitence concern.

The repository pattern, in a DDD point of view, won't manipulate Entities directly, but rather Aggregates. Yeah, cool story bro, but what does it change? Long story short, an aggregate can be seen as a transactionnal boundary that protects strict Domain Invariants, invariants that must complies with trancationnal consistency, opposed to eventual consistency. A repository, in a DDD perspective, will fecth an instance of an Aggregate, that is an object rooted by DDD's Entity called Aggregate Root, with optionnal Entities and Value Objects within it.
With EF, a Repository will do the heavy lifting, fetching datas from one or more SQL Tables, relying on a Factory to provide a fully instanciated and ready-to-use Aggregate. It will also do the transactionnal work in order to save the Aggregate (and its internals components) in a structured, relationnal Fashion in the DB. But Aggregates shouldn't know about repository. The core model doesn't mind about any persistence details. Aggregate usage belongs to the "Application Layer" or the "Use Case" layer, not the Domain layer.

Let's wrap it up. Let's say you want to implement DDD repository in an asp.net thin app :

class OrderController
{
    private IOrderRepository _orderRepository;

    public OrderController(IOrderRepository orderRepository)
    {
        _orderRepository = orderRepository;
    }

    public async Task PlaceOrder(Guid orderId)
    {
        var aggregate = await _orderRepository.FindByIdAsync(orderId);
        aggregate.PlaceOrder();
        await _orderRepository.Save();
    }
}

internal interface IOrderRepository
{
    void Add(Order order);
    void Remove(Order order);
    Task<Order> FindByIdAsync(Guid id);
    Task Save();
}

internal class Order
{
    public Guid Id { get; }

    private IList<Item> items;
    public static Order CreateOrder(IList<Item> items)
    {
        return new Order(items);
    }

    private Order(IList<Item> items)
    {
        this.Id = Guid.NewGuid();
        this.items = items;
    }

    public void PlaceOrder()
    {
        // do stuff with aggregate sttus and items list
    }
}

What happens here? The controller is the "Use Case" layer : it's responsible for fecthing the aggregate (the Aggregate Root from the repo, make the Aggregate do its job then command the repo to save its changes. It could be more transparent with an unit of work in the controller, that would save the injected DbContext (because the concrete repo will have to access different DbSet: Order and Items)
But you get the idea. You may also want to keep 1 Data Access per table, but it will be used by the Aggregate-dedicated Repository.

Hope it was clear enough

like image 126
Oinant Avatar answered Oct 21 '22 18:10

Oinant


Merging the result sets of the same query run against different datasets doesn't work in general.

It's pretty straight forward if you only have local inserts and only use where and select in your queries because then the merge operation is just append.
It gets increasingly more difficult as you try to support more operators like order by, skip & take, group by and also local updates and deletions.

In particular there's no other way to support group by with local updates and deletions but to merge both data sources first and then applying the group by.

Doing this in your app is going to be unfeasible because it would mean retrieving the whole table, applying local changes and then doing the group by.

Something that might work is to transfer your local changes to the database instead and running the query there.

There are two ways that i can think of to achieve this.

Transforming queries

Transform your queries to include local changes by replacing their from clause

so a query like

select sum(salary) from employees group by division_id

would become

select
    sum(salary) 
from 
(
    select 
        id, name, salary, division_id 
    from employees
    -- remove deleted and updated records
    where id not in (1, 2)
    -- add inserted records and new versions of updated records
    union all values (1, 'John', 200000, 1), (99, 'Jane', 300000, 1)
) _
group by division_id

This should also work for joins if you apply the same transformation to the joined tables.
It would require some pretty involved customization to do this with ef though.

This is an idea on how to implement it at least partially with ef, it won't support joins and unfortunately involves some manual sql generation.

static IQueryable<T> WithLocal<T>(this DbContext db)
    where T : Entity
{
    var set = db.Set<T>();
    var changes = db.ChangeTracker.Entries<T>();
    var model = db.Model.FindEntityType(typeof(T));

    var deletions = changes
        .Where(change => change.State == EntityState.Deleted)
        .Select(change => change.Entity.Id);
        
    return set
        // Hard part left as an exercise for the reader :)
        // Generate this from 'changes' and 'model', you can use parameters for the values
        .FromSqlRaw("select 1 as id, 'John' as name, 200000 as salary, 1 as division_id union all select 99 as id, 'Jane' as name, 300000 as salary, 1 as division_id")
        .Union(set.Where(entity => !deletions.Contains(entity.Id)));
}

you can then use this like so

var query = db.WithLocal<Employee>()
    .GroupBy(employee => employee.DivisionId)
    .Select(group => group.Sum(employee => employee.Salary));

Keeping a transaction open

A simpler way is to just do the writes to the database but without committing the transaction, this way all the queries that you run on the same transaction will see the changes but no one else will, at the end of the request you can then commit or rollback from outside of your repositories.

With this approach your queries will also see database generated values like computed columns, auto increment ids and trigger generated values.


I have never tried this and can't speak for the performance implications of these approaches but if you need this feature I think there aren't many other ways..

like image 3
Roald Avatar answered Oct 21 '22 19:10

Roald


You can look into this repository implementation approach from the Microsoft powered EShopOnWeb project:

According to the rules of Domain-driven design a repository is dedicated to handle a collection of aggregates. The interface in this sample solution looks like the following:

public interface IAsyncRepository<T> where T : BaseEntity, IAggregateRoot
{
    Task<T> GetByIdAsync(int id, CancellationToken cancellationToken = default);
    Task<IReadOnlyList<T>> ListAllAsync(CancellationToken cancellationToken = default);
    Task<IReadOnlyList<T>> ListAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
    Task<T> AddAsync(T entity, CancellationToken cancellationToken = default);
    Task UpdateAsync(T entity, CancellationToken cancellationToken = default);
    Task DeleteAsync(T entity, CancellationToken cancellationToken = default);
    Task<int> CountAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
    Task<T> FirstAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
    Task<T> FirstOrDefaultAsync(ISpecification<T> spec, CancellationToken cancellationToken = default);
}

The interface itself resides in the domain layer (here in this project called application core).

The concrete implementation repository implementations (here for EFCore) reside in the infrastructure layer.

There is a generic EFCore repository implementation for covering common repository methods:

public class EfRepository<T> : IAsyncRepository<T> where T : BaseEntity, IAggregateRoot
{
    protected readonly CatalogContext _dbContext;

    public EfRepository(CatalogContext dbContext)
    {
        _dbContext = dbContext;
    }

    public virtual async Task<T> GetByIdAsync(int id, CancellationToken cancellationToken = default)
    {
        var keyValues = new object[] { id };
        return await _dbContext.Set<T>().FindAsync(keyValues, cancellationToken);
    }

    public async Task<T> AddAsync(T entity, CancellationToken cancellationToken = default)
    {
        await _dbContext.Set<T>().AddAsync(entity);
        await _dbContext.SaveChangesAsync(cancellationToken);

        return entity;
    }

    public async Task UpdateAsync(T entity, CancellationToken cancellationToken = default)
    {
        _dbContext.Entry(entity).State = EntityState.Modified;
        await _dbContext.SaveChangesAsync(cancellationToken);
    }

    public async Task DeleteAsync(T entity, CancellationToken cancellationToken = default)
    {
        _dbContext.Set<T>().Remove(entity);
        await _dbContext.SaveChangesAsync(cancellationToken);
    }
}

I just referenced some of the methods here.

And for more specific repository methods that fit the requirements you can implement more specific repository interfaces in the domain layer which are again implemented in the infrastructure layer derived by the generic IAsyncRepository and that specific interface. See here for an example (although the method provided is not the best example I think you can get the idea).

With this approach actual saving to the database is completely handled by the repository implementation and not part of the repository interface.

Transactions on the other should not be in neither the domain layer or the repository implementation. So if you need several aggregate updates to be consistent within the same use case this transaction handling should be handled in the application layer.

This also fits with the rule of Eric Evans from his Book Domain-Driven Design.

Leave transaction control to the client. Although the REPOSITORY will insert into and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.

See Chapter Six, Repositories.

like image 2
afh Avatar answered Oct 21 '22 18:10

afh