<h3>Goal</h3> Find a perfect, flexible schema for storing many different types of objects with a wide variety of links between them in a relational database. <hr> <h3>Problem</h3> EAV is a workaround to the normal confinements of a RDBMS. If you were to normalize an EAV schema, it would be ugly. <hr> <h3>Idea</h3> If EAV was normalized, it would be ugly. Does the fact that we traditionally maintain these schema by hand limit their complexity and power? But if it was maintained and queried programmatically, what would it matter? <hr> <h3>Graphs</h3> If you have <code>n</code> different entities in <code>n</code> different tables, why not let your code generate <code>n(n+1)/2</code> link tables and the queries between them? Would this not result in a true graph in a normalized schema? In a highly interlinked database, there will always be exponentially more edges than vertices. Why not focus on creating proper, normalized verticles (<code>n</code> entity tables) and let our code maintain the edges (<code>n^x</code> link tables)? <hr> <h3>Conclusion</h3> Can a system normalize EAV and maintain the resulting complex schema? Can complex graphs be stored in (and remain true to) relational databases? I'm sure this has been done before, but I've never seen it. What am I missing? <hr> <h3>Example problem</h3> Storing printed works and their bibliographic data <ul> <li> Many properties which might be not just strings but whole objects. </li> <li>In the library world, there is no simple (and relational) schema which can store data "losslessly" without extremely complex schemas.</li> <li> Many different types of associations and associated objects <ul> <li>And their relevant properties (which can vary wildly). </li> <li>And their many relationships, of different types, amongst themselves.</li> </ul> </li> </ul> <hr> <h3>Questions</h3> "What problem are you trying to solve?" -Piet I'm looking for a normalized solution to EAV, graphs, and polymorphic relationships in a relational database system. "I would hate to be the guy who has to understand or maintain it after it's gone into production." -Andrew This "traditional maintenance" is the exact thing I'm saying we should be automating. Isn't it largely grunt work?

Your idea would certainly create a completely flexible schema that can represent any kind of object graph. I would hate to be the guy who has to understand or maintain it after it's gone into production. One benefit in a well designed data schema is the constraints. I'm not just refering to the physical column constraints you can define, but the constraints imposed by the overall structure. There are a fixed set of explicit relationships, and this provides well defined paths to follow. In your scenario, there would always be a large number of paths from one entity to another. How would somebody know which path was the "right" path. The "right" path will simply be "the set of relationships the developer chose to populate". Imagine a database that has these relationships. Customer <===> Invoice <===> InvoiceLineItem <====> Product If I'm looking at this, and somebody asks me: "Give me a list of customers and for each customer a list of product's they've bought", I would know how to write the query. But, if this was a graph where everything pointed to everything else, how will I know which path is the "right" path. Will it be the "Customer_Product" relationship, the "Customer_Invoice_Line_Item" to "Customer_Product", or "Customer_Invoice" to "Invoice_Product", or "Customer" to "Invoice" to "Invoice_Line_Item" to "SomeOtherTableIHaven'tEvenLookedAtYet" to "Product"? The answer can be "It should be obvious", but it is very common for something to be obvious to one developer only.

Storing graphs in fully-normalized relational databases

Goal

Find a perfect, flexible schema for storing many different types of objects with a wide variety of links between them in a relational database.

Problem

EAV is a workaround to the normal confinements of a RDBMS.

If you were to normalize an EAV schema, it would be ugly.

Idea

If EAV was normalized, it would be ugly.

Does the fact that we traditionally maintain these schema by hand limit their complexity and power?

But if it was maintained and queried programmatically, what would it matter?

Graphs

If you have n different entities in n different tables, why not let your code generate n(n+1)/2 link tables and the queries between them? Would this not result in a true graph in a normalized schema?

In a highly interlinked database, there will always be exponentially more edges than vertices. Why not focus on creating proper, normalized verticles (n entity tables) and let our code maintain the edges (n^x link tables)?

Conclusion

Can a system normalize EAV and maintain the resulting complex schema?

Can complex graphs be stored in (and remain true to) relational databases?

I'm sure this has been done before, but I've never seen it. What am I missing?

Example problem

Storing printed works and their bibliographic data

Many properties which might be not just strings but whole objects.
In the library world, there is no simple (and relational) schema which can store data "losslessly" without extremely complex schemas.
Many different types of associations and associated objects
- And their relevant properties (which can vary wildly).
- And their many relationships, of different types, amongst themselves.

Questions

"What problem are you trying to solve?"
-Piet

I'm looking for a normalized solution to EAV, graphs, and polymorphic relationships in a relational database system.

"I would hate to be the guy who has to understand or maintain it after it's gone into production."
-Andrew

This "traditional maintenance" is the exact thing I'm saying we should be automating. Isn't it largely grunt work?

782

asked Oct 16 '10 21:10

Tim

2 Answers

Since you are editing the question, it must be active.

Yes, there are much better ways of designing this, for the purpose and use you describe.

The first issue is EAV, which is usually very badly implemented. More precisely, the EAV crowd, and therefore the literature is not of high quality, and standards are not maintained, therefore the basic integrity and quality of a Relational Database is lost. Which leads to the many well-documented problems.

You should consider the proper academically derived alternative. This retaiins full Relational integrity and capability. It is called Sixth Normal Form. EAV is in fact a subset of 6NF, without the full understanding; the more commonly known rendition of 6NF.

6NF implemented correctly is particularly fast, in that it stores columns, not rows. Therefore you can map your data (graph series, data points) in such a way, as to gain a flat high speed regardless of the vectors that you use to access the graphs. (You can eliminate duplication to a higher order than 5NF, but that is advanced use.)

"Highly-interlinked" is not a problem at all. That is the nature of a Relational Database. The caveat here is, it must be truly Normalised, not a inlerlinked bunch of flat files.

The automation or code generation is not a problem. Of course, you need to extend the SQL catalogue, and ensure it is table-driven, if you want quality and maintainability.

My answers to these questions provide a full treatment of the subject. The last one is particularly long due the the context and arguments raised.
EAV-6NF Answer One
EAV-6NF Answer Two
EAV-6NF Answer Three

And this one is worthwhile as well:
Schema-Related Problem

180

answered Oct 24 '22 16:10

PerformanceDBA

Your idea would certainly create a completely flexible schema that can represent any kind of object graph. I would hate to be the guy who has to understand or maintain it after it's gone into production.

One benefit in a well designed data schema is the constraints. I'm not just refering to the physical column constraints you can define, but the constraints imposed by the overall structure. There are a fixed set of explicit relationships, and this provides well defined paths to follow.

In your scenario, there would always be a large number of paths from one entity to another. How would somebody know which path was the "right" path. The "right" path will simply be "the set of relationships the developer chose to populate".

Imagine a database that has these relationships.

Customer <===> Invoice <===> InvoiceLineItem <====> Product

If I'm looking at this, and somebody asks me: "Give me a list of customers and for each customer a list of product's they've bought", I would know how to write the query.

But, if this was a graph where everything pointed to everything else, how will I know which path is the "right" path. Will it be the "Customer_Product" relationship, the "Customer_Invoice_Line_Item" to "Customer_Product", or "Customer_Invoice" to "Invoice_Product", or "Customer" to "Invoice" to "Invoice_Line_Item" to "SomeOtherTableIHaven'tEvenLookedAtYet" to "Product"? The answer can be "It should be obvious", but it is very common for something to be obvious to one developer only.

answered Oct 24 '22 16:10

Andrew Shepherd

Related questions
                            
                                Optimise Tinder type mysql query
                            
                                Db design for data update approval
                            
                                Designing a questiion-and-answer system that is flexible and efficient
                            
                                Zero downtime (or near zero) db schema changes
                            
                                What do I need to know about databases?
                            
                                Understanding MongoDB (and NoSQL in general) and how to make the best use of it
                            
                                Good database design, variable number of attributes
                            
                                How to replicate foreign key of another table in one to many relationship
                            
                                Is there a Java equivalent to Apple's Core Data?
                            
                                How to manage and capture database changes across several developers?
                            
                                Is there any SQL Server database designer software like MySQL workbench? [closed]
                            
                                View or stored procedure for complex queries?
                            
                                What is suggested database schema for order/invoice information?
                            
                                what is different between database clustering and database partitioning
                            
                                Representing Number Ranges in a Relational Database (MySQL)
                            
                                Single or separate databases for separate customer accounts?
                            
                                Doctor scheduling database design
                            
                                Set the default value of a column based on another column of a different data type
                            
                                Table "Inheritance" in SQL Server
                            
                                How to best represent addresses in a database [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Storing graphs in fully-normalized relational databases

Tags:

language-agnostic

data-structures

graph

relational-database

database-design