In my codebase I recently came across a design decision made by the team where key-value pairs are stored in a formatted manner within a database(Relational-mysql) column. There is a universal set of metadata and a subset of this metadata might be present for a particular record. For a given record, its metdata subset and its values are stored in a column in a formatted manner as follows:
Key1:Value1\n\nKey2:Value2\n\nKey3:Value3\n\n.....
To get the metadata for a given record-id would then boil down to just running a simple select and then parsing the result to populate a dictionary in memory.
The rationale for doing this was cited as follows:
- Better performance than maintaining a denromalized table consisting of the columns recordId/Key/Value.
- Scalability
- To be conservative on space on the database server.
I can see the logic of storing these parings in the database column but something tells me this might cause problems in the longer run and may not be the panacea to our "scalability" woes.
Can somebody give some feedback on what might be wrong with this approach and what are some of the best practices on storage and retrieval of information like this on systems under heavy load.
Thanks
Obviously it depends on the particular case, but this sort of 1NF violation is generally a bad approach. One significant problem is that you can't ever query on the metadata. (E.g., "SELECT WHERE key2 = 'value3'") Another is that you can't ever update a single key/value without parsing, adjusting, un-parsing, and rewriting the whole large set. To address the claims individually:
Has this claim actually been tested against your data? If you only ever need one key/value from the record, you currently have to pay the database overhead to read the whole set, the network overhead to transport it to the client, and the cpu overhead to parse out the one piece you need. Doing that job inherently is precisely what the database was designed for, so you're essentially disabling the component that excels at that sort of work and poorly emulating it with unnecessary client-side programming.
How do they figure that? Storing all key/value pairs in a single field will degrade as the number of pairs increases.
Almost certainly irrelevant. Disk space is cheaper than bad design.
P.S. What happens if you have a value with two newlines in it?
The big question is do they make sense in isolation / how often do you need to select individual pairs.
If it's mainly a property bag stored as name = value, and the pairs are related, then storing in one lump saves space and time.
If you wanted to easily access individual pairs quickly, then table with name and value columns makes sense, as long as they have unique names of course. That will use up more space, and if you need to access more than one in a hit, you lose some of the advantage.
There's no right or wrong to this one. There might be a best, but that could easily change. We use both approaches on a case by case basis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With