Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best pattern for modeling sparse attributes

I work on a financial services product that stores lots of information on the end-customer. Our clients continuously want to add new attributes, that often that aren't used to drive any process in our product. They are captured and displayed but nothing else. Due to differences in how our clients operate, they often want to store very different values. We have tried two solutions to accomodate them:

  1. Sparsely populated tables with 100s of hundred columns.
  2. Entity Attribute Value tables where the customer can define new columns as needed.

We have experienced most of the disadvantages of both solutions. Lots of columns provide comfort for us in that we know what data we are adding to our database, but can make us appear inflexible and expensive when a customer 'just' wants to store a new value, like Favourite Golf Club. EAV has shown all of its usual problems: poorly performing queries, losing control of the data, lack of validation and maintainability problems.

So is there a better pattern out there?

like image 920
John Cullen Avatar asked Oct 14 '13 15:10

John Cullen


2 Answers

I did a presentation about this subject at Percona Live MySQL Conference & Expo 2013. My presentation was called Extensible Data Modeling.

For your situation, since the user-defined attributes are not used in your SQL queries (only captured and displayed as you say), I would recommend the Serialized LOB pattern.

Here's the abstract of my presentation. The slides are freely available:

Designing an extensible, flexible schema that supports user customization is a common requirement, but it's easy to paint yourself into a corner.

Examples of extensible database requirements:

  • A database that allows users to declare new fields on demand.
  • Or an e-commerce catalog with many products, each with distinct attributes.
  • Or a content management platform that supports extensions for custom data.

The solutions we use to meet these requirements is overly complex and the performance is terrible. How should we find the right balance between schema and schemaless database design?

I'll briefly cover the disadvantages of Entity-Attribute-Value (EAV), a problematic design that's an example of the antipattern called the Inner-Platform Effect, That is, modeling an attribute-management system on top of the RDBMS architecture, which already provides attributes through columns, data types, and constraints.

Then we'll discuss the pros and cons of alternative data modeling patterns, with respect to developer productivity, data integrity, storage efficiency and query performance, and ease of extensibility.

  • Class Table Inheritance
  • Serialized BLOB
  • Inverted Indexing

Finally we'll show tools like pt-online-schema-change and new features of MySQL 5.6 that take the pain out of schema modifications.

like image 167
Bill Karwin Avatar answered Nov 02 '22 07:11

Bill Karwin


I would model this as a separate attributes table, not a table with multiple "custom" columns... what happens when you have 100 columns and they want to add attribute # 101? What about clients with very few custom attributes? A hundred NULL columns...

In this case, your storage type can simply be VARCHAR(MAX) because you perform no logic on these columns except to SELECT and display them. The consequence is that you have potentially inefficient storage of INT or DATE types (or whatever disparate types you may want to store) but that is the nature of allowing the client to store anything in these custom fields.

Consider a table having five columns:

  • Id
  • ParentType
  • ParentId
  • CustomValueName
  • CustomValue

So now you have enough information to:

  1. Distinctly tie your custom attribute to any other entity in your DB
  2. Name the attribute types for custom aggregations if needed
  3. Append any value the user wants

The downside is that it's somewhat painful to query on these custom attributes (though it can be done easily in SQL, the query plan won't be very efficient).

like image 22
Matthew Avatar answered Nov 02 '22 07:11

Matthew