Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of large EAV/open schema systems on SQL Server

Has anyone implemented a very large EAV or open schema style database in SQL Server? I'm wondering if there are performance issues with this and how you were able to overcome those obstacles.

like image 869
JC Grubbs Avatar asked Oct 10 '08 20:10

JC Grubbs


1 Answers

Regardless of MS SQL Server versus any other brand of database, the worst performance issue with EAV is that people try to do monster queries to reconstruct an entity on a single row. This requires a separate join per attribute.

SELECT e.id, a1.attr_value as "cost", a2.attr_value as "color",
  a3.attr_value as "size", . . .
FROM entity e
  LEFT OUTER JOIN attrib a1 ON (e.entity_id = a1.entity_id AND a1.attr_name = 'cost')
  LEFT OUTER JOIN attrib a2 ON (e.entity_id = a2.entity_id AND a2.attr_name = 'color')
  LEFT OUTER JOIN attrib a2 ON (e.entity_id = a3.entity_id AND a3.attr_name = 'size')
  . . . additional joins for each attribute . . .

No matter what database brand you use, more joins in a query means geometrically increasing performance cost. Inevitably, you need enough attributes to exceed the architectural capacity of any SQL engine.

The solution is to fetch the attributes in rows instead of columns, and write a class in application code to loop over these rows, assigning the values into object properties one by one.

SELECT e.id, a.attr_name, a.attr_value
FROM entity e JOIN attrib a USING (entity_id)
ORDER BY e.id;

This SQL query is so much simpler and more efficient, that it makes up for the extra application code.

What I would look for in an EAV framework is some boilerplate code that retrieves a multi-row result set like this, and maps the attributes into object properties, and then returns the collection of populated objects.

like image 103
Bill Karwin Avatar answered Oct 14 '22 07:10

Bill Karwin