Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use a key-value store for web development?

When would someone use a key-value (Redis, memcache, etc) store for web development? An actual use case would be most helpful.

My confusion is that a simple database seems so much more functional because, to my understanding, it can do everything a key-value store can do PLUS it also allows you to do filtering/querying. Meaning, to my understanding, you can NOT do filter like:

select * homes where price > 100000

with a key-value store.

Example

Let's pretend that StackOverflow uses a key-value store (memcache, redis, etc).

How would a key-value store help benefit Stackoverflow hosting needs?

like image 492
Jacjoi Avatar asked Aug 04 '11 02:08

Jacjoi


5 Answers

I can't answer the question of when to use a key-value (herein kv) data store but I can show you some of the examples, and answer your stackoverflow example.

With database access, most of what you need is a kv store. For example, a user logs in with the username "joe". So you look up "user:joe" in your database and retrieve his password (hash of course). Or maybe you have his password under "user:pass:joe", it really doesn't matter. If it was stack overflow and you were rendering the page http://stackoverflow.com/questions/6935566/when-to-use-a-key-value-store-for-web-development, you would look up "question:6935566" and use that. It is simple to see how kv stores can solve most of your problems.

I would like to say that a kv store is a subset of functionality provided by a traditional RDMS. This is because the design of the traditional RDMS provides many scaling issues, and generally loses features as you scale. kv stores don't come with these features, so they don't limit you. However, these features can often be created anyways, designed from the core to be scalable (because it becomes immediately obvious if they are not).

However that doesn't mean that there are things that you can't do. For example you mention querying. This is a pitfall of many kv stores, as they are generally agnostic of the value (not always true, example, redis and more) and have no way of finding what you are looking for. Worse, they are not designed to do that quickly, they are just really quick looking up by key.

One solution to this problem is to sort your keys lexicographically and allow range queries. This is essentially "give me everything between question:1 and question:5". Now that example is fairly useless, but there are many uses of range queries.

You said you want all houses more then $100 000. If you wanted to be able to do this you would create an index of houses by price. Say you had the following houses.

house:0 -> {"color":"blue","sold":false,"city":"Stackoverville","price":500000} house:1 -> {"color":"red","sold":true,"city":"Toronto","price":150000} house:2 -> {"color":"beige","sold":false,"city":"Toronto","price":40000} house:3 -> {"color":"blue","sold":false,"city":"The Blogosphere","price":110000} 

In SQL you would store each field in a column rather then having it all in one (in this case JSON) document. And could SELECT * FROM houses WHERE price > 100000. This seems all fine and dandy but, if there isn't an index built, this requires looking at every house in your table and checking its price, which if you have a couple million houses, could be slow. So with a kv store you need an index as well. The main difference is that the SQL database would silently do the slow thing, where the kv store wouldn't be able.

If you don't have range queries you would need to stick your index in a single document, which makes safely updating it a pain and means that you would have to download the whole index for every query, again, limiting scalability.

house:index:price -> [{"price":500000,"id":"0"},{"price":150000,"id":"1"},{"price":110000,"id":"3"},{"price":40000,"id":"2"}] 

But if you have range queries (often called keyscans) you can create an index like this:

house:index:price:040000 -> 2 house:index:price:110000 -> 3 house:index:price:150000 -> 1 house:index:price:500000 -> 0 

And then you could request the keys between house:index:price:100000 and house:index:price:: (the ':' character is the character after '9') and you would get [3,1,0] which is all the houses more expensive than $100 000 (they are also helpfully in order). Another nice thing about this is that they will likely be on one "partition" of your cluster so this query will take about the same time as a singe get (plus the tiny extra transfer overhead) or two gets if your range happens to go over a server boundary (but these can be done in parallel!).

So that shows how to do queries in a kv store. You can query anything that can be ordered as a string (just about anything) and look it up very quickly. If you don't have range queries you will need to store your whole index under one key which sucks, but if you have range queries it is very nice, and very fast. Here is a more complex example.

I want unsold houses in Toronto that are less then $100 000. I simply have to design my index. (I added in a couple of houses to make it more meaningful) At first thought you might just build another index for every property, but you will quickly realize that that means that you have to select every unsold house and download it from the database. (This is what I meant when I said scaling problems are immediately obvious.) The solution is to use a multi-index. Once built you can select exactly the values you want.

house:index:sold:city:price:f~Fooville~000010:5        -> "" house:index:sold:city:price:f~Toronto~040000:2         -> "" house:index:sold:city:price:f~Toronto~140000:4         -> "" house:index:sold:city:price:t~Stackoverville~500000:0  -> "" house:index:sold:city:price:t~The Blogosphere~110000:3 -> "" house:index:sold:city:price:t~Toronto~150000:1         -> "" 

Now, unlike the last example I put the id in the key. This allows two houses have the same properties. I could have merged them in the value but then adding a removing indexes becomes more difficult. I also chose to separate my data with a ~. This is because it is lexicographically after all of the letters, ensuring that the full name will be sorted and I don't have to pad every city to the same length. In a production system I would probably use the byte 255 or 0.

Now the range house:index:sold:city:price:f~Toronto~100000 - house:index:sold:city:price:f~Toronto~~ will select all houses that match the query. And the important thing to note is that query scales linearly with the number of results. This does mean that you have to build an index for every set of properties that you want to index (although the index in our example also works for sold, and sold-city queries). This may seem like a lot of work but in the end you realize that it is just that you are doing it, not your database. I'm sure we will begin to see libraries for this kind of thing coming out soon :D

After stretching the topic a bit, I have shown:

  • Some uses of a kv store.
  • How to do queries in a kv store.

I think that you will find that kv-stores are enough for many applications and can often provide better performance and availability than traditional RDMS. That being said, every app is different and therefore, it is impossible to answer the original question.

like image 182
Kevin Cox Avatar answered Sep 28 '22 08:09

Kevin Cox


Do not confuse a NoSQL type database with something like memcached (which is not intended to store data permanently).

Typical use for memcached is to store some query results that can be accessed by a cluster of web servers - ie. a shared cache. Eg. On this page is a list of related posts and there is likely a bit of work for the database to do to produce that list. If you do that every time someone loads the page then you will create a lot of work for the database. Instead, the results once retrieved for the first time could be stored on a memcached server with the key being the page ID. Any of the web servers in the cluster can then fetch that information very quickly without having to constantly hit the database. After a while, the cache entry would be purged by memcached so that the results for old articles don't use up space. [Disclaimer: I've no idea if StackOverflow does this in reality].

A "NoSQL" database on the other hand is for storing information permanently. If your data schema is quite simple and so are your queries, then it may be faster than a standard SQL database. A lot of web applications don't need hugely complex databases, and so NoSQL databases can be a good fit.

like image 32
Ben Strawson Avatar answered Sep 28 '22 08:09

Ben Strawson


There are two general viable use-cases for noSQL:

  1. Rapid application development
  2. Massively scalable systems

The fact that most noSQL solutions are effectively schema-less; require far less ceremony to operate; are light-weight (in terms of API); and provide significant performance gains in contrast to the more canonical relational persistence systems informs their suitability for the above 2 use-cases (in the general sense).

Being cynical -- or perhaps practical in the business sense -- one can propose a 3rd general use-case for noSQL systems (still informed by the above set of characteristics/features):

It is easier to grock and any inexperienced (but un-brain-dead) aspring geek can pick it up in a snap. That is a very powerful feature. (Try that with Oracle ..)

So, the use-cases of noSQL systems -- which in general can be characterized as relaxed persistent systems -- are all optimally informed by practical considerations.

There is absolutely no question -- outside of hugely massively scalable systems -- that RDBMS systems are formally perfect systems designed to insure data integrity.

like image 42
alphazero Avatar answered Sep 28 '22 08:09

alphazero


Key-value stores are usually really fast so it's good to have them as a cache for data that is heavily accessed and rarely updated to reduce load on your DBs.

As you said, you are usually limited with queries (though MongoDB handles them pretty well), but key-value stores are mostly meant for accessing precise data: user X's profile, session X's info, etc.

A "traditional" DB will probably be more than enough for the average website, but if you experience high loads key-value stores can really help your load times.

EDIT: And by "high loads", I mean really high loads. Key-value stores are rarely necessary.

See this comparison of key-value stores.

like image 37
dee-see Avatar answered Sep 28 '22 09:09

dee-see


Just an adding to bstrawson's answer, "mem-cache-d" is a caching mechanism while Redis is a permanent storage but both store data as key-value pair.

Search on a key-value storage(something like Redis or Membase) more like search all the value in a relational database, too slow. If you want do some querying you may need to move to document-oriented NoSQL type DB such as MongoDB or CouchDB which you can do some query part.

Near future you will able to handle couchbase sever 2.0 which will address all your burning issues with NoSQL data querying with newly introduced UnQL and caching(directly derived from the memcached source code)

like image 35
Dasun Avatar answered Sep 28 '22 09:09

Dasun