Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Attribute Value Database vs. strict Relational Model Ecommerce

People also ask

What is the difference between a database entity and a database attribute?

An entity type typically corresponds to one or several related tables in database. Attribute. A characteristic or trait of an entity type that describes the entity, for example, the Person entity type has the Date of Birth attribute. Record.

What is the difference between an attribute and relationships in a database?

Difference ChartAn attribute describes the elementary feature of an entity. In the relational data model, an entity is represented as a record in an entity set. In the relational data model, a field represents an attribute. A row in a database table is an entity.

How do you identify entities and attributes?

The main difference between Entity and Attribute is that an entity is a real-world object that represents data in RDBMS while an attribute is a property that describes an entity. Relational Database Management System (RDBMS) is a type of database management system based on the relational model.

What is a entity attribute?

Entity–attribute–value model (EAV) is a data model to encode, in a space-efficient manner, entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest.


There's a few general pros and cons I can think of, there are situations where one is better than the other:

Option 1, EAV Model:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add (might even be added by users?)
  • Pro: "generic" interface components
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple reports
  • Con: complex reports can become almost impossible
  • Con: poor performance for large data sets

Option 2, Modelling each entity separately:

  • Con: more time required to gather requirements and design
  • Con: new entities must be modelled and designed by a professional
  • Con: custom interface components for each entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)

  • Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
  • Con: new entities must be modelled and designed by a professional
  • Pro: new attributes might be easily added later on
  • Con: complex code required to validate simple data types (for the custom attributes)
  • Con: custom interface components still required, but generic interface components may be possible for the custom attributes
  • Con: SQL becomes complex as soon as any custom attribute is included in a report
  • Con: good performance generally, unless you start need to search by or report by the custom attributes

* I'm not sure if Option 3 would necessarily save any time in the design phase.

Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.


It is safe to say that the EAV/CR database model is bad.

No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.

Now, to your real question: How to store various attributes and keep them searchable?

Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.

EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.


// At this point, I'd like to take a moment to speak to you about the Magento/Adobe PSD format.
// Magento/PSD is not a good ecommerce platform/format. Magento/PSD is not even a bad ecommerce platform/format. Calling it such would be an
// insult to other bad ecommerce platform/formats, such as Zencart or OsCommerce. No, Magento/PSD is an abysmal ecommerce platform/format. Having
// worked on this code for several weeks now, my hate for Magento/PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.

http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107

The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...

Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.

CREATE OR REPLACE VIEW sales_flat_addresses AS
SELECT sales_order_entity.parent_id AS order_id, 
       sales_order_entity.entity_id, 
       CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type, 
       GROUP_CONCAT( 
         CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value )
         ORDER BY sales_order_entity_varchar.value DESC
         SEPARATOR '!!!!!' 
       ) as data
  FROM sales_order_entity
       INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id
       INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id
   AND sales_order_entity.entity_type_id =12
 GROUP BY sales_order_entity.entity_id
 ORDER BY eav_attribute.attribute_code = 'address_type'

Exacts address information for an order, lazily

--

Summary: Only use Magento if:

  1. You are being given large sacks of money
  2. You must
  3. Enjoy pain

I'm surprised nobody mentioned NoSQL databases.

I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".


Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.

I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.

An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.

All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".


I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure): https://forum.virtuemart.net/index.php?topic=58686.0

It seems, that Magento's EAV performance is a real showstopper.

That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.

Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.

The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.


I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.