I keep coming across scenarios where it will be useful to store a set of arbitrary data in a table using a per-row key/value model, rather than a rigid column/field model. The problem is, I want to store the values with their correct data type rather than converting everything to a string. This means I have to choose either a single table with multiple nullable columns, one for each data type, or a set of value tables, one for each data type. I'm also unsure as to whether I should use full third normal form and separate the keys into a separate table, referencing them via a foreign key from the value table(s), or if it would be better to keep things simple and store the string keys in the value table(s) and accept the duplication of strings.
Old/bad:
This solution makes adding additional values a pain in a fluid environment because the table needs to be modified regularly.
MyTable ============================ ID Key1 Key2 Key3 int int string date ---------------------------- 1 Value1 Value2 Value3 2 Value4 Value5 Value6
Single Table Solution
This solution allows simplicity via a single table. The querying code still needs to check for nulls to determine which data type the field is storing. A check constraint is probably also required to ensure only one of the value fields contains non-null data.
DataValues ============================================================= ID RecordID Key IntValue StringValue DateValue int int string int string date ------------------------------------------------------------- 1 1 Key1 Value1 NULL NULL 2 1 Key2 NULL Value2 NULL 3 1 Key3 NULL NULL Value3 4 2 Key1 Value4 NULL NULL 5 2 Key2 NULL Value5 NULL 6 2 Key3 NULL NULL Value6
Multiple-Table Solution
This solution allows for more concise purposing of each table, though the code needs to know the data type in advance as it needs to query a different table for each data type. Indexing is probably simpler and more efficient because there are less columns that need indexing.
IntegerValues =============================== ID RecordID Key Value int int string int ------------------------------- 1 1 Key1 Value1 2 2 Key1 Value4 StringValues =============================== ID RecordID Key Value int int string string ------------------------------- 1 1 Key2 Value2 2 2 Key2 Value5 DateValues =============================== ID RecordID Key Value int int string date ------------------------------- 1 1 Key3 Value3 2 2 Key3 Value6
How do you approach this problem? Which solution is better?
Also, should the key column be separated into a separate table and referenced via a foreign key or be should it be kept in the value table and bulk updated if for some reason the key name changes?
A key-value database is a type of nonrelational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects.
Key-Value stores are most primitive and the first datastores to be invented. The examples of key-value stores are: Amazon dynamo, memcachedb, voldemort, redis and riak. The fourth category of NoSQL datastores is Graph Oriented data stores. These are the most recent kind of datastores.
key-value store, or key-value database is a simple database that uses an associative array (think of a map or dictionary) as the fundamental data model where each key is associated with one and only one value in a collection.
Another alternative to a single Table design is store the DataType
where you have an internal enum that differentiates the data type ie:
1 - string
2 - int
3 - date
4 - etc
Table
============================
ID Key Value DataType
int string string int
----------------------------
1 Key Value 1
I recommend storing plain text in the value as in the Pragmatic Programmer
book, plain text will outlive all data types, and it allows you to perform any manipulation to it.
As what Thomas said, there is always a discipline trade off for EAVs. The discipline comes when you insert the data to ensure that the data type is validated and inserted as the correct type you expect.
When you query you just parse depending on data type on your code.
ie: if response.dataType == enum.date { parseDate(response)}
else if response.dataType == enum.int { parseInt(response)} // etc etc
worst case in production if it fails in production and insert the incorrect datatype, you can just alter the datatype in your db and your code should parse accordingly
I just want to state/iterates that EAVs should be used in moderation, and there are certain trade offs going this path, and I recommend reading on them before continuing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With