Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the Repository Pattern work if Entities are related to each other?

There is a question about IRepository and what it is used for, that has a seemingly good answer.

My problem though: How would I cleanly deal with entities that are related to each other, and isn't IRepository then just a layer without real purpose?

Let's say I have these business objects:

public class Region {
    public Guid InternalId {get; set;}
    public string Name {get; set;}
    public ICollection<Location> Locations {get; set;}
    public Location DefaultLocation {get; set;}
}

public class Location {
    public Guid InternalId {get; set;}
    public string Name {get; set;}
    public Guid RegionId {get; set;}
}

There are rules:

  • Every Region MUST have at least one location
  • Newly created Regions are created with a location
  • No SELECT N+1 please

So how would my RegionRepository look like?

public class RegionRepository : IRepository<Region>
{
    // Linq To Sql, injected through constructor
    private Func<DataContext> _l2sfactory;

    public ICollection<Region> GetAll(){
         using(var db = _l2sfactory()) {
             return db.GetTable<DbRegion>()
                      .Select(dbr => MapDbObject(dbr))
                      .ToList();
         }
    } 

     private Region MapDbObject(DbRegion dbRegion) {
         if(dbRegion == null) return null;

         return new Region {
            InternalId = dbRegion.ID,
            Name = dbRegion.Name,
            // Locations is EntitySet<DbLocation>
            Locations = dbRegion.Locations.Select(loc => MapLoc(loc)).ToList(),
            // DefaultLocation is EntityRef<DbLocation>
            DefaultLocation = MapLoc(dbRegion.DefaultLocation)
         }
     }

     private Location MapLoc(DbLocation dbLocation) {
         // Where should this come from?
     }
}

So as you see, a RegionRepository needs to fetch locations as well. In my example, I use Linq To Sql EntitySet/EntiryRef, but now Region needs to deal with mapping Locations to Business Objects (because I have two sets of objects, business and L2S objects).

Should I refactor this to something like:

public class RegionRepository : IRepository<Region>
{
    private IRepository<Location> _locationRepo;

    // snip

    private Region MapDbObject(DbRegion dbRegion) {
         if(dbRegion == null) return null;

         return new Region {
            InternalId = dbRegion.ID,
            Name = dbRegion.Name,
            // Now, LocationRepo needs to concern itself with Regions...
            Locations = _locationRepo.GetAllForRegion(dbRegion.ID),
            // DefaultLocation is a uniqueidentifier
            DefaultLocation = _locationRepo.Get(dbRegion.DefaultLocationId)
         }  
  }

Now I have nicely separated my data layer into atomic repositories, only dealing with one type each. I fire up the Profiler and... Whoops, SELECT N+1. Because each Region calls the location service. We only have a dozen regions and 40 or so location, so the natural optimization is to use DataLoadOptions. The problem is that RegionRepository doesn't know if LocationRepository is using the same DataContext or not. We are injecting factories here after all, so LocationRepository might spin up it's own. And even if it doesn't - I'm calling a service method that provides business objects, so the DataLoadOptions may not be used anyway.

Ah, I overlooked something. IRepository is supposed to have a method like this:

public IQueryable<T> Query()

So now I would do

         return new Region {
            InternalId = dbRegion.ID,
            Name = dbRegion.Name,
            // Now, LocationRepo needs to concern itself with Regions...
            Locations = _locationRepo.Query()
                        .Select(loc => loc.RegionId == dbRegion.ID)
                        .ToList(),
            // DefaultLocation is a uniqueidentifier
            DefaultLocation = _locationRepo.Get(dbRegion.DefaultLocationId)
         }  

That looks good. At first. On second inspection,I have separate business and L2S objects, so I still don't see how this avoids SELECT N+1 since Query can not just return GetTable<DbLocation>.

The problem seems to be having two different sets of objects. But if I decorate Business Objects with all the System.Data.LINQ attributes ([Table], [Column] etc.), that breaks the abstraction and defeats the purpose of IRepository. Because maybe I want to also be able to use some other ORM, at which point I would now have to decorate my Business Entities with other attributes (also, if the business entities are in a separate .Business assembly, consumers of it now need to reference all ORMs as well for the attributes to be resolved - yuck!).

To me, it seems that IRepository should be IService, and the above class should look like this:

public class RegionService : IRegionService {
      private Func<DataContext> _l2sfactory;

      public void Create(Region newRegion) {
        // Responsibility 1: Business Validation
        // This could of course move into the Region class as
        // a bool IsValid(), but that doesn't change the fact that
        // the service concerns itself with validation
        if(newRegion.Locations == null || newRegion.Locations.Count == 0){
           throw new Exception("...");
        }

        if(newRegion.DefaultLocation == null){
          newRegion.DefaultLocation = newRegion.Locations.First();
        }

        // Responsibility 2: Data Insertion, incl. Foreign Keys
        using(var db = _l2sfactory()){
            var dbRegion = new DbRegion {
                ...
            }

            // Use EntitySet to insert Locations as well
            foreach(var location in newRegion.Locations){
                var dbLocation = new DbLocation {

                }
                dbRegion.Locations.Add(dbLocation);
            }

            // Insert Region AND all Locations
            db.InsertOnSubmit(dbRegion);
            db.SubmitChanges();
        }
      }
}

This also solves a chicken-egg problem:

  • DbRegion.ID is generated by the database (as newid()) and IsDbGenerated = true is set
  • DbRegion.DefaultLocationId is a non-nullable GUID
  • DbRegion.DefaultLocationId is a FK into Location.ID
  • DbLocation.RegionId is a non-nullable GUID and a FK into Region.ID

Doing this without EntitySet is pretty much impossible, so unless you sacrifice data integrity on the database and move it into the business logic, it's impossible to keep responsibility about Locations out of the Region provider.

I see how this posting can be seen as not a real question, subjective and argumentative, so please allow me to formulate a objective questions:

  • What exactly is the Repository Pattern supposed to abstract away?
  • In the real world, how do people optimize their database layer without breaking the abstraction the Repository Pattern is supposed to achieve?
  • Specifically, how does the real world deal with SELECT N+1 and data integrity concerns?

I guess my real question is this:

  • When already using an ORM (like Linq To Sql), isn't DataContext already my Repository, and thus a Repository on top of DataContext is just abstracting the very same thing again?
like image 819
Michael Stum Avatar asked Nov 12 '11 08:11

Michael Stum


2 Answers

When designing your repositories you should think about what is known as the aggregate root. Essentially this means that if an entity can exist alone within the domain it will more than lkely have it's own repository. In your case this would be the Region.

Consider the classic customer/order scenario. The Customer repository would provide access to Orders since an order cannot exist without a customer and therefore unless you have a valid business case for it, you are unlikely to need a separate Order repository.

In a simple application your assumption may well be correct but remember that unless you provide an abstraction of your L2S context you'll struggle to perform effective unit testing. Coding against an interface, whether that be an IServiceX, IRepositoryX or whatever affords you that level of separation.

The decision as to whether Service interfaces come into the design is generally related again to the complexity of the business logic and the need for an extensible Api into that logic that maybe consumed by several disparate clients.

like image 168
Darren Lewis Avatar answered Sep 29 '22 23:09

Darren Lewis


I have several thoughts about all this: 1. AFAIK Repository pattern was invented a bit earlier then ORM. Back in the days on plain SQL queries it was quite a good idea to implement Repository, and buy this abstract your code from the actual database used. 2. I could say that Repository is completely not needed now, but unfortunately, from my experience I can't say, that any ORM can truly abstract you from all database details. E.g. I could not once create an ORM mapping and just use it with any other DB server, that ORM claim to support (particularly I'm talking about Microsoft EF). So if you really want to be able to use different database servers, then you propably still need to use Repository. 3. Another concern is very simple: code duplication. For sure, there some queries that you call frequently it your code. If you leave only ORM as your repository, then you'll be duplicating those queries, so it will be better to have some level of abstraction over ORM container, that would hold those common used queries.

like image 24
Vladimir Perevalov Avatar answered Sep 29 '22 21:09

Vladimir Perevalov