Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity framework 6 code first: what is the best implementation for a baseobject with 10 childobjects

We have a baseobject with 10 childobjects and EF6 code first.

Of those 10 childobjects, 5 have only a few (extra) properties, and 5 have multiple properties (5 to 20). We implemented this as table-per-type, so we have one table for the base and 1 per child (total 10).

This, however, creates HUGE select queries with select case and unions all over the place, which also takes the EF 6 seconds to generate (the first time).

I read about this issue, and that the same issue holds in the table-per-concrete type scenario.

So what we are left with is table-per-hierachy, but that creates a table with a large number of properties, which doesn't sound great either.

Is there another solution for this?

I thought about maybe skip the inheritance and create a union view for when I want to get all the items from all the child objects/records.

Any other thoughts?

like image 819
Michel Avatar asked Jun 27 '14 10:06

Michel


1 Answers

Another solution would be to implement some kind of CQRS pattern where you have separate databases for writing (command) and reading (query). You could even de-normalize the data in the read database so it is very fast.

Assuming you need at least one normalized model with referential integrity, I think your decision really comes down to Table per Hierarchy and Table per Type. TPH is reported by Alex James from the EF team and more recently on Microsoft's Data Development site to have better performance.

Advantages of TPT and why they're not as important as performance:

Greater flexibility, which means the ability to add types without affecting any existing table. Not too much of a concern because EF migrations make it trivial to generate the required SQL to update existing databases without affecting data.

Database validation on account of having fewer nullable fields. Not a massive concern because EF validates data according to the application model. If data is being added by other means it is not too difficult to run a background script to validate data. Also, TPT and TPC are actually worse for validation when it comes to primary keys because two sub-class tables could potentially contain the same primary key. You are left with the problem of validation by other means.

Storage space is reduced on account of not needing to store all the null fields. This is only a very trivial concern, especially if the DBMS has a good strategy for handling 'sparse' columns.

Design and gut-feel. Having one very large table does feel a bit wrong, but that is probably because most db designers have spent many hours normalizing data and drawing ERDs. Having one large table seems to go against the basic principles of database design. This is probably the biggest barrier to TPH. See this article for a particularly impassioned argument.

That article summarizes the core argument against TPH as:

It's not normalized even in a trivial sense, it makes it impossible to enforce integrity on the data, and what's most "awesome:" it is virtually guaranteed to perform badly at a large scale for any non-trivial set of data.

These are mostly wrong. Performance and integrity are mentioned above, and TPH does not necessarily mean denormalized. There are just many (nullable) foreign key columns that are self-referential. So we can go on designing and normalizing the data exactly as we would with a TPH. In a current database I have many relationships between sub-types and have created an ERD as if it were a TPT inheritance structure. This actually reflects the implementation in code-first Entity Framework. For example here is my Expenditure class, which inherits from Relationship which inherits from Content:

public class Expenditure : Relationship
{
    /// <summary>
    /// Inherits from Content: Id, Handle, Description, Parent (is context of expenditure and usually 
    /// a Project)
    /// Inherits from Relationship: Source (the Principal), SourceId, Target (the Supplier), TargetId, 
    /// 
    /// </summary>
    [Required, InverseProperty("Expenditures"), ForeignKey("ProductId")]
    public Product Product { get; set; }
    public Guid ProductId { get; set; }

    public string Unit { get; set; }
    public double Qty { get; set; }
    public string Currency { get; set; }
    public double TotalCost { get; set; }        

}

The InversePropertyAttribute and the ForeignKeyAttribute provide EF with the information required to make the required self joins in the single database.

The Product type also maps to the same table (also inheriting from Content). Each Product has its own row in the table and rows that contain Expenditures will include data in the ProductId column, which is null for rows containing all other types. So the data is normalized, just placed in a single table.

The beauty of using EF code first is we design the database in exactly the same way and we implement it in (almost) exactly the same way regardless of using TPH or TPT. To change the implementation from TPH to TPT we simply need to add an annotation to each sub-class, mapping them to new tables. So, the good news for you is it doesn't really matter which one you choose. Just build it, generate a stack of test data, test it, change strategy, test it again. I reckon you'll find TPH the winner.

like image 53
dave walker Avatar answered Sep 27 '22 19:09

dave walker