Dynamic columns in database tables vs EAV

Tags:

I'm trying to decide which way to go if I have an app that needs to be able to change the db schema based on the user input.

For example, if I have a "car" object that contains car properties, like year, model, # of doors etc, how do I store it in the DB in such a way, that the user should be able to add new properties?

I read about EAV tables and they seem right for this thing, but the problem is that queries will get pretty complicated when I try to get a list of cars filtered by a set of properties.

Could I generate the tables dynamically instead? I see that Sqlite has support for ADD COLUMN, but how fast is it when the table reaches many records? And it looks like there's no way to remove a column. I have to create a new table without the column I want to remove, and copy the data from the old table. That's certainly slow on large tables :(

599

asked May 08 '15 13:05

Alex

2 Answers

Another option that I haven't seen mentioned above is to use denormalized tables for the extended attributes. This is a combination of the pseudo-generic model and the dynamic columns in database tables. Instead of adding columns to existing tables, you add columns or groups of columns into new tables with FK indexes to the source table. Of course, you'll want a good naming convention (car, car_attributes_door, car_attributes_littleOldLadies)

Your selection problem becomes that of applying a LEFT OUTER JOIN to include the extended attributes that you want to include.
- Slower than normalized, but not as slow as EAV.
Adding new extended attributes becomes a problem of adding a new table.
- Harder than EAV, easier/faster than modifying table schema.
Deleting attributes becomes a problem of dropping whole tables.
- Easier/faster than modifying table schema.
These new attributes can be strongly typed.
- As good as modifying table schema, faster than EAV or generic columns.

The biggest advantage to this approach that I can see is that deleting unused attributes is quite easy compared to any of the others via a single DROP TABLE command. You also have the option to later normalize often-used attributes into larger groups or into the main table using a single ALTER TABLE process rather than one for each new column you were adding as you added them, which helps with the slow LEFT OUTER JOIN queries.

The biggest disadvantage is that you're cluttering up your table list, which admittedly is often not a trivial concern. That and I'm not sure how much better LEFT OUTER JOIN's actually perform than EAV table joins. It's definitely closer to EAV join performance than normalized table performance.

If you're doing a lot of comparisons/filters of values that benefit greatly from strongly typed columns, but you add/remove these columns frequently enough to make modifying a huge normalized table intractable, this seems like a good compromise.

104

answered Oct 15 '22 12:10

JVal90

I will assume that SQLite (or another relational DBMS) is a requirement.

EAVs

I have worked with EAVs and generic data models, and I can say that the data model is very messy and hard to work with in the long run.

Lets say that you design a datamodel with three tables: entities, attributes, and _entities_attributes_:

CREATE TABLE entities
(entity_id INTEGER PRIMARY KEY, name TEXT);

CREATE TABLE attributes 
(attribute_id INTEGER PRIMARY KEY, name TEXT, type TEXT);

CREATE TABLE entity_attributes 
(entity_id INTEGER, attribute_id INTEGER, value TEXT, 
PRIMARY KEY(entity_id, attribute_id));

In this model, the entities table will hold your cars, the attributes table will hold the attributes that you can associate to your cars (brand, model, color, ...) and its type (text, number, date, ...), and the _entity_attributes_ will hold the values of the attributes for a given entity (for example "red").

Take into account that with this model you can store as many entities as you want and they can be cars, houses, computers, dogs or whatever (ok, maybe you need a new field on entities, but it's enough for the example).

INSERTs are pretty straightforward. You only need to insert a new object, a bunch of attributes and its relations. For example, to insert a new entity with 3 attributes you will need to execute 7 inserts (one for the entity, three more for the attributes, and three more for the relations.

When you want to perform an UPDATE, you will need to know what is the entity that you want to update, and update the desired attribute joining with the relation between the entity and its attributes.

When you want to perform a DELETE, you will also need to need to know what is the entity you want to delete, delete its attributes, delete the relation between your entity and its attributes and then delete the entity.

But when you want to perform a SELECT the thing becomes nasty (you need to write really difficult queries) and the performance drops horribly.

Imagine a data model to store car entities and its properties as in your example (say that we want to store brand and model). A SELECT to query all your records will be

SELECT brand, model FROM cars;

If you design a generic data model as in the example, the SELECT to query all your stored cars will be really difficult to write and will involve a 3 table join. The query will perform really bad.

Also, think about the definition of your attributes. All your attributes are stored as TEXT, and this can be a problem. What if somebody makes a mistake and stores "red" as a price?

Indexes are another thing that you could not benefit of (or at least not as much as it would be desirable), and they are very neccesary as the data stored grows.

As you say, the main concern as a developer is that the queries are really hard to write, hard to test and hard to maintain (how much would a client have to pay to buy all red, 1980, Pontiac Firebirds that you have?), and will perform very poorly when the data volume increases.

The only advantage of using EAVs is that you can store virtually everything with the same model, but is like having a box full of stuff where you want to find one concrete, small item.

Also, to use an argument from authority, I will say that Tom Kyte argues strongly against generic data models: http://tkyte.blogspot.com.es/2009/01/this-should-be-fun-to-watch.html https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

Dynamic columns in database tables

On the other hand, you can, as you say, generate the tables dynamically, adding (and removing) columns when needed. In this case, you can, for example create a car table with the basic attributes that you know that you will use and then add columns dynamically when you need them (for example the number of exhausts).

The disadvantage is that you will need to add columns to an existing table and (maybe) build new indexes.

This model, as you say, also has another problem when working with SQLite as there's no direct way to delete columns and you will need to do this as stated on http://www.sqlite.org/faq.html#q11

BEGIN TRANSACTION;
CREATE TEMPORARY TABLE t1_backup(a,b);
INSERT INTO t1_backup SELECT a,b FROM t1;
DROP TABLE t1;
CREATE TABLE t1(a,b);
INSERT INTO t1 SELECT a,b FROM t1_backup;
DROP TABLE t1_backup;
COMMIT;

Anyway, I don't really think that you will need to delete columns (or at least it will be a very rare scenario). Maybe someone adds the number of doors as a column, and stores a car with this property. You will need to ensure that any of your cars have this property to prevent from losing data before deleting the column. But this, of course depends on your concrete scenario.

Another drawback of this solution is that you will need a table for each entity you want to store (one table to store cars, another to store houses, and so on...).

Another option (pseudo-generic model)

A third option could be to have a pseudo-generic model, with a table having columns to store id, name, and type of the entity, and a given (enough) number of generic columns to store the attributes of your entities.

Lets say that you create a table like this:

CREATE TABLE entities
(entity_id INTEGER PRIMARY KEY,
 name TEXT,
 type TEXT,
 attribute1 TEXT,
 attribute1 TEXT,
 ...
 attributeN TEXT
 );

In this table you can store any entity (cars, houses, dogs) because you have a type field and you can store as many attributes for each entity as you want (N in this case).

If you need to know what the attribute37 stands for when type is "red", you would need to add another table that relates the types and attributes with the description of the attributes.

And what if you find that one of your entities needs more attributes? Then simply add new columns to the entities table (attributeN+1, ...).

In this case, the attributes are always stored as TEXT (as in EAVs) with it's disadvantages.

But you can use indexes, the queries are really simple, the model is generic enough for your case, and in general, I think that the benefits of this model are greater than the drawbacks.

Hope it helps.

Follow up from the comments:

With the pseudo-generic model your entities table will have a lot of columns. From the documentation (https://www.sqlite.org/limits.html), the default setting for SQLITE_MAX_COLUMN is 2000. I have worked with SQLite tables with over 100 columns with great performance, so 40 columns shouldn't be a big deal for SQLite.

As you say, most of your columns will be empty for most of your records, and you will need to index all of your colums for performance, but you can use partial indexes (https://www.sqlite.org/partialindex.html). This way, your indexes will be small, even with a high number of rows, and the selectivity of each index will be great.

If you implement a EAV with only two tables, the number of joins between tables will be less than in my example, but the queries will still be hard to write and maintain, and you will need to do several (outer) joins to extract data, which will reduce performance, even with a great index, when you store a lot of data. For example, imagine that you want to get the brand, model and color of your cars. Your SELECT would look like this:

SELECT e.name, a1.value brand, a2.value model, a3.value color
FROM entities e
LEFT JOIN entity_attributes a1 ON (e.entity_id = a1.entity_id and a1.attribute_id = 'brand')
LEFT JOIN entity_attributes a2 ON (e.entity_id = a2.entity_id and a2.attribute_id = 'model')
LEFT JOIN entity_attributes a3 ON (e.entity_id = a3.entity_id and a3.attribute_id = 'color');

As you see, you would need one (left) outer join for each attribute you want to query (or filter). With the pseudo-generic model the query will be like this:

SELECT name, attribute1 brand, attribute7 model, attribute35 color
FROM entities;

Also, take into account the potential size of your _entity_attributes_ table. If you can potentially have 40 attributes for each entity, lets say that you have 20 not null for each of them. If you have 10,000 entities, your _entity_attributes_ table will have 200,000 rows, and you will be querying it using one huge index. With the pseudo-generic model you will have 10,000 rows and one small index for each column.

answered Oct 15 '22 12:10

antonio

Related questions
                            
                                Too many open files while ensure index mongo
                            
                                Google BigQuery Underlying Architecture
                            
                                Setting TCP timeout for SQL connection in Go
                            
                                How can I avoid SQL injection attacks?
                            
                                When are database triggers bad? [duplicate]
                            
                                Haskell DB recommendations? [closed]
                            
                                how to change connection string initial catalog
                            
                                How to create unique index on fields with possible null values (Oracle 11g)?
                            
                                How to make a Facebook like Notifications Table Structure?
                            
                                how to write procedure to insert data in to the table in phpmyadmin?
                            
                                How to find out database migration logs on Rails?
                            
                                Java sql delete row
                            
                                How to delete all events in MySQL
                            
                                How to store 60 Booleans in a MySQL Database?
                            
                                How do I create a sqllite3 in-memory database?
                            
                                Can a junction table (join table) also be used for a one-to-many relationship?
                            
                                Is there a better way to index multiple columns than creating an index for each permutation?
                            
                                Should a database table that contains two columns that are foreign keys have a third column which is the primary key?
                            
                                Local database without sql server
                            
                                DAO and JDBC relation?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dynamic columns in database tables vs EAV

Tags:

database

sqlite

database-design

entity-attribute-value

Alex

People also ask

2 Answers

JVal90

antonio

Recent Activity

Donate For Us