There seems to be a big push for key/value based databases, which I believe memcache to be.
Is the value usually some sort of collection or xml file that would hold more meaningfull data?
If yes, is it generally faster to deserialize data then to do traditinally JOINS and selects on tables that return a row based result set?
Key-value databases can handle the scaling of large amounts of data and extremely high volumes of state changes while servicing millions of simultaneous users through distributed processing and storage. Key-value databases also have built-in redundancy, which can handle the loss of storage nodes.
A telephone directory is a good example, where the key is the person or business name, and the value is the phone number. Stock trading data is another example of a key-value pair.
You can store a value, such as an integer, a string, a JSON structure, or an array, along with a key used to reference that value. For example, a simple key-value database might have a value such as “Douglas Adams.” This value is then assigned an ID, such as cust1237.
Examples of Popular Key-Value Databases Aerospike: Open-source database that is optimized for in-memory storage. Berkeley DB: Another open-source database that is a high-performance database storage library, although it's relatively basic. Couchbase: Interestingly allows for text searches and SQL-style querying.
What has happened is that some really, really, REALLY big web sites like Google and Amazon occupy a teeny, tiny niche where their data storage and retrieval requirements are so different to anyone else's that a new way of storing/retrieving data is called for. I'm sure these guys know what they are doing, they are very good at what they do.
However, then this gets picked up and reported on and distorted into "relational databases aren't up to handling data for the web". Also, readers start to think "hey, if relational databases aren't good enough for Amazon and Google, they aren't good enough for me."
These inferences are both wrong: 99.9% of all databases (including those behind web sites) are not in the same ball park as Amazon and Google - not within several orders of magnitude. For this 99.9%, nothing has changed, relational databases still work just fine.
As with most things, "it depends". If the joins are relatively inconsequential (that is, a small number of joins on well-keyed data), and you are storing especially complex data, it may be better just to stick with the more complex query.
It's also a matter of freshness. In many cases the purpose of many joins is to bring together very disparate data; that is, data which varies widely in its relative freshness. It can add considerable complexity and overhead to keep a key-value pair table synchronized when a small slice of the data across a large number of pairs is updated. System complexity can often be considered a form of performance cost; the time, risk and cost to make a change to a complex system without impacting performance is often far greater than a simple one.
The best solution is always to code what works as simply as you can. In most cases I'd say this means create a fully normalized database design and join the crap out of it. Only revisit your design after performance becomes an obvious problem. When you analyze the issue, it will also be obvious where the problems lie and what needs to be done to fix them. If it's reducing joins, then so be it. You'll know when you need to know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With