Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best pattern for storing (product) attributes in SQL Server

We are starting a new project where we need to store product and many product attributes in a database. The technology stack is MS SQL 2008 and Entity Framework 4.0 / LINQ for data access.

The products (and Products Table) are pretty straightforward (a SKU, manufacturer, price, etc..). However there are also many attributes to store with each product (think industrial widgets). These may range from color to certification(s) to pipe size. Every product may have different attributes, and some may have multiples of the same attribute (Ex: Certifications).

The current proposal is that we will basically have a name/value pair table with a FK back to the product ID in each row.

An example of the attributes Table may look like this:

ProdID     AttributeName     AttributeValue
123        Color             Blue
123        FittingSize       1.25
123        Certification     AS1111
123        Certification     EE2212
123        Certification     FM.3
456        Pipe              11
678        Color             Red
999        Certification     AE1111
...

Note: Attribute name would likely come from a lookup table or enum.

So the main question here is: Is this the best pattern for doing something like this? How will the performance be? Queries will be based on a JOIN of the product and attributes table, and generally need many WHEREs to filter on specific attributes - the most common search will be to find a product based on a set of known/desired attributes.

If anyone has any suggestions or a better pattern for this type of data, please let me know.

Thanks! -Ed

like image 601
EdH Avatar asked May 26 '10 00:05

EdH


People also ask

Which is best way to store attribute of large string value in SQL?

The MEDIUMTEXT data object is useful for storing larger text strings like white papers, books, and code backup. These data objects can be as large as 16 MB (expressed as 2^24 -1) or 16,777,215 characters and require 3 bytes of overhead storage.

Can an attribute have multiple values SQL?

attributes cannot contain multiple values (elements can) attributes cannot contain tree structures (elements can) attributes are not easily expandable (for future changes)

How many attributes are there in SQL?

There are three types of attributes: Free-form attributes, which allow free-form input for text, numbers, dates, or links. Domain-based attributes, which are populated by entities. For more information, see Domain-Based Attributes (Master Data Services).

What are product attributes?

Product attributes are the features that describe an item – objective and subjective details that can help a potential customer decide what to buy. These include, but aren't limited to, technical specifications, design features like color and size, materials used in manufacturing, and the price.


3 Answers

You are about to re-invent the dreaded EAV model, Entity-Attribute-Value. This is notorious for having problems in real-life, for various reasons, many covered by Dave's answer.

Luckly the SQL Customer Advisory Team (SQLCAT) has a whitepaper on the topic, Best Practices for Semantic Data Modeling for Performance and Scalability. I highly recommend this paper. Unfortunately, it does not offer a panacea, a cookie cutter solution, since the problem has no solution. Instead, you'll learn how to find the balance between a fixed queryable schema and a flexible EAV structure, a balance that works for your specific case:

Semantic data models can be very complex and until semantic databases are commonly available, the challenge remains to find the optimal balance between the pure object model and the pure relational model for each application. The key to success is to understand the issues, make the necessary mitigations for those issues, and then test, test, and test. Scalability testing is a critical success factor if you are going to find that optimal design.

like image 144
Remus Rusanu Avatar answered Sep 28 '22 19:09

Remus Rusanu


This is going to be problematic for a couple of reasons:

  • Your entity queries will be much harder to write. Transforming the results of those queries into something resembling a ViewModel when it comes time for presentation is going to be painful because it will involve a pivot for each product.

  • Understanding what your datatypes will be is going to be tough when it comes time to read certain types of data. Are you planning on storing this as strings? For example, DateTimes hold more data than the default .ToString() implementation writes to the string. You're also going to have issues if you try to store floating-point values.

  • Your objects' data integrity is at risk. There will be a temptation to put properties which should be just attributes of your main product tables in this "bucket o' data". Maybe the design will be semi-sane to begin with, but I guarantee you that after a certain amount of time, folks will start to just throw properties in the bag. It'll then be very tough to keep your objects' integrity with such a loosely defined structure.

  • Your indexes will most likely be suboptimal. Again think of a property which should be on your product table. Instead of being able to index on just one column, you will now be forced to make a potentially very large composite index on your "type" table.

  • Since you're apparently planning to throw out proper datatypes and use strings, the performance of range queries for numeric data will likely be poor.

  • Your table will get big, slowing backups and queries. Instead of an integer being 4 bytes, you're going to have to store far more for an integer of any size.

Better to normalize the table in a more "traditional" way using "IS-A" relationships. For example, you might have Pipes, which are a type of Product, but have a couple more attributes. You might have Stoves, which are a type of product, but have a couple more attributes still.

If you really have a generic database and all sorts of other properties which aren't going to be subject to data integrity rules, you very well may want to consider storing data in an XML column. It's hard to tell you what the correct design choice is unless I know a lot more about your business.

IMO this is a design antipattern. The siren song of this idea has lured many a developer onto the rocks of of an unmaintainable application.

like image 27
Dave Markle Avatar answered Sep 28 '22 19:09

Dave Markle


I know it is an old one - however there might be other readers...

I have seen the balance EAV to attribute modeled approach. Well - it is still EAV. "EAV's are like drugs" is pretty much true. So what about thinking it through once more - and let's be aggressive really: I still liked the supertype apporach, where a lot of tables use the same primary key from a key generator. Let's reuse this one. So what about creating a new table for each set of attributes - all having the primary from the same key generator? Eg. you would have a table with the fields "color,pipe", another table "fittingsize,pipe", and so on. The requirement "volatility of attributes" screams for a carefully(automatically) maintained data dictionary anyway.

This approach is fully normalized and can be fully automated. You can support checks if specific attribute sets materialized already as table by hashing attribute name clusters, eg. crc32(lower('color~fittingsize~pipe')) where the atribute names need to be sorted alphabetically. Of course this requires to have the hash in the data dictionary. Based on the data dictionary each object can be searched (using 'UNION'), especially if the data dictionary itself is a table. Having the data dictionary as table also allows you to use its primary (surrogate) key as basis for unique tablenames, to end up with tables like 'attributes1','attributes2',... Most databases nowadays support some billion tables - so we are sort of save on that end as well. You could even have a product catalouge with very common attributes, that reference the extended attribute tables.

An open issue are 1:n data sets. I am afraid you need to sort them out in separate tables. However this very much depends on your data presentation and querying strategy. Should they always be presented as comma seperated string attached to the product or do you want to eg. be able to query for all products of a certain Certification?

Before you flame this approach please consider this: It is meant for use cases where you have a very high volatility of attributes - in quantity and quality - only. Also it was preset, that you cannot know most of the attributes at the point in time when the solution is created. So do not discuss this in a context where you can model your attributes upfront which would enable you to balance trade offs much better.

like image 37
Steffen Avatar answered Sep 28 '22 20:09

Steffen