I need to store a set of entities, of which there are several specialized versions. They have some common properties, but the specialized ones contain properties specific for that entity.
The data store is a relational DBMS, and this is not for discussion :-) Specifically, it is the Microsoft SQL Server 2005.
I could easily create a table for the common properties and then a table for each of the specialized versions. However, it is likely that new entities will have to be added to the solution later and I don't want to maintain both an object model and a database schema.
Another idea is to create a table
reading(<common properties>, extended_properties)
and have the extended_properties
field be some kind of serialization of the extended properties. I was thinking either JSON or XML. I will most likely be using an ORM framework, but I haven't decided yet. Either way, the object representation of a specialized entity from the reading
could expose a dictionary {extended_property_name, value}
containing the parsed key/value pairs from the extended_properties
field.
From this http://msdn.microsoft.com/en-us/library/ms345117(SQL.90).aspx I gather that XML fields, combined with schemas for these, give the notion of typed XML inside the DBMS. Also, queries involving the XML contents in the extended_properties
field can take these into account, too.
Feedback on my solution suggestions, primarily the one with the reading
table and serialization of the extended properties.
Also, I realize this is one of the limitations of relational DBMS' compared to key/value based stores. However, there surely must be some modelling techniques to accommodate this.
Any feedback is greatly appreciated!
Virtually all relational databases support schemas, it's part of the SQL standard.
Schema is of three types: Logical Schema, Physical Schema and view Schema. Logical Schema – It describes the database designed at logical level. Physical Schema – It describes the database designed at physical level. View Schema – It defines the design of the database at the view level.
A schema defines a set of types or classes. Each type has a set of properties. A property either has an associated property type or is a group of properties. Properties with a property type have a structure according to that type's definition, they may have a value and properties of themselves.
There are two main kinds of database schema: A logical database schema conveys the logical constraints that apply to the stored data. It may define integrity constraints, views, and tables. A physical database schema lays out how data is stored physically on a storage system in terms of files and indices.
Anders, do not give up any integrity or hardness, eg type safety.
(Response coming).
@Anders. No, not at all, subtyping is fine (the question is which form you use and what are the dis/advantages). Do not give up any strength or Integrity or type safety or checks or DRI. The form you choose will demand additional Checks and maybe a bit of code (depends on your platform).
This subject is coming up frequently, but the seeker always has a narrow perspective; I keep making the same statements (a subset) from an unchanging set. The idea is to evaluate all the options. So I am writing a doc. Unfortunately it is taking longer. Maybe 4 pages. Not ready to post. But the diagrams are finished, I think you are on the ball, and you can use it right away.
Warning: Experienced Project Construction Engineers Only
Road not suitable for caravans or readers with a high Eek factor
Link to ▶Four Alternative Data Models◀ in Document Under Construction. Apologies for the mess on the floor; I will clean up soon.
▶Link to IDEF1X Notation◀ for anyone who is unfamiliar with the Standard for modelling Relational databases.
They are all Relational, with full integrity.
The 6NF options. Relational today (SQL) does not provide support for 6NF; it does not disallow it, it just does not provide the 5NF➔6NF structures. Therefore you need to build a small catalogue, what some people call "metadata". Really, it is just an extension of the standard SQL catalogue (sys tables). The level of control required is modelled in each option.
Essentially EAV done properly, with full control and integrity (type safety, Declarative Referential Integrity, etc) rather than the mess it usually is.
You may be interested in these related question/answers (in particular, look at the Data Models):
Multiple Fixed vs Abstract Flexible
Database Schema-Related Problem
"Simple" Database Design Problem
... That way, we can easily grab "Comment" rows associated with a given specialized type instance. Is this the way to do that, or will I regret that decision later? Is there any other pattern we're missing?
Not sure what you mean. Comments, Notes, Addresses, end up being used (columns resident in) in many tables, so the correct method is to Normalise them; provide One Table for Comment; that is referenced from any table that requires it. Here is a generic Comment table. It is used in Product (the supertype) because you stated any Product. It can just as easily be used in some of the Product subtypes, and not others; in which case the FK will be in said Product Subtypes.
Your Data Model
What is the purpose of the ProductType table in your Product 5NF/subtype example? Does it contain a row corresponding to each specialized Product, e.g., ProductCPU? I assume it indicates which specialization the base product is.
(Small critical mistake in the diagram, corrected.)
Yes, exactly.
In Standard Relational terms (not the uncontrolled messes passing off as databases), the ProductType is the Discriminator; it identifies which of the Product Subtypes apply to this Product. Tells you which Product Subtype table you need to join with. The pair together make a logical Product. Do not forget to produce the Views, one for each ProductType.
(Do evaluate how ProductType changes, exactly what role it plays, for each of the four Data Models.)
"Generalisation-specialisation" is all mumbo jumbo, OO terminology; without crossing the line and learning what Relational has been capable of for 30 years. If you learn a little about Relational, you will have the full power; otherwise you are limited to the very limited OO approach to everything (Ambler and Fowler have a lot to answer for). Please read this post, from 11 Dec 10 onwards. Relational databases model Entities, not objects; not classes.
For example, when adding a new product you'll want to provide, say, a dropdown selection of which product types it is possible to add. Based on this selection, it can be deduced which tables to put the data in. Correct? I'm sorry for talking about application code, but I just need to put it into perspective
Yes. And what page (with fields) to provide next, for the user to enter data.
No problem talking about the app code that will use the Rdb, they go together like husband and wife (not husband and slave).
For your OO classes, map the Class tree to the Rdb, once you have finished modelling the Rdb, independent of any app that will use it. Not the other way around. And not dependent on one app.
Forget about "persisting", it has many problems (Lost Updates; damaged data integrity; problematic debugging; massive contention; etc). All updates to the Rdb should be in Transactions, with ACID compliance, available for 30 years, but Fowler and Ambler have not read about it yet. Usually that means one stored proc pre xact.
The discriminant is a FK to a Type-table as we established earlier. It denotes which
spec.sub type the base type adheres to. But what does the discriminant table contain in detail?
Is that not clear from the data model ? ProducType CHAR(1)
or (2). Name Char(30)
.
Could be a display-friendly text stating the type for UI-purposes,
Yes, among other things, such as the control, contraint, etc, elimination of ambiguity when coding or reporting.
but does it also contain the exact table name which contains the specialized type?
No. That would be a little too physical to be placed in data. Disallowed on principle.
But it is not necessary.
Say I'm interested in the Product with ID = 1. It has a discriminant indicating that it is a ProductCPU. How would you go about retrieving this ProductCPU from your app code?
That will be easy if you take the provided model, and implement it (all the tables) as classes, correctly, etc. The example you request will not use Views (which are for lists, and more generic use). The pseudo-code would be:
ProductId
(Subtype unknown, therefore your should not be sitting a a Subtype-specific window), load the Product
supertype onlybased on the Discriminator Product.ProductType
, set indicators, etc, and load the applicable subtype, one of ProductCPU; ProductMemory; ProductDisk; ProductTape
; etc.
I have seen (and do not agree with) OO methods that load all subtypes for the given ProductId
at once: one subtype is valid; and the rest are invalid. The code still has to constrain itself to the valid class for the Product
based on Product.ProductType
.
Alternately, eg. where the context is, the user is sitting in a Subtype-specific window, eg. ProductCPU
, with that class set up, and requests ProductId
xxx. Then use the ProductCPU
View. If it returns zero rows, it does not exist.
ProductDisk
xxx, but not a ProductCPU
xxx. How you handle that, whether you indicate there is a Product`xxx but it isn't a CPU, or not, that depends on the app requirements.For lists, where the app fills in a grid, without regard to the ProductId
, use the views (one each) to load each grid. That SQL is based on the join, and does not need to refer to ProductType
.
I would go for the "create a table for the common properties and then a table for each of the specialized versions" method, personally.
Reason: you say that your implementation will be done in a RDBMS and this is non-negotiable. Fine. Dumping unstructured, blob-like stuff like a serialized hashtable in a DB field goes against the design philosophy of RDBMS though, so you will have a severe hit on efficiency unless you are ok with the idea of treating the *extended_properties* field as an opaque blob, just like a gif or another binary object.
In other words, forget querying (efficiently) for "all the objects having extended property COLOR=RED".
The problem you have (describing OO taxonomies in a RDBMS) is definitely not new. Have a look at this, for a in-depth description of the options.
This is a classic example of the gen-spec design pattern. Gen-spec is covered in every tutorial on object modeling, because it is handled by inheritance. It's frequently skipped over in tutorials on relational data modeling. But it's well understood.
Do a web sreach on "generalization specialization relational modeling". You'll see several articles on how to set up a single table for the general class and a table for each specialized class. The articles will help you with foreign key design. In particular, the primary key of each specialized table does double duty. It's also a foreign key to the generalized table.
This won't look very familiar to you if you are used to object modeling. But you'll find it works well. And the solution offered by most of the articles is not dynamic, so you'll have to do some DDL every time a new specialized subclass is discovered.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With