What is a good way to manage keys in a key-value store?

Tags:

Trying to define some policy for keys in a key-value store (we are using Redis). The keyspace should be:

Shardable (can introduce more servers and spread out the keyspace between them)
Namespaced (there should be some mechanism to "group" keys together logically, for example by domain or associated concepts)
Efficient (try to use as little as possible space in the DB for keys, to allow for as much data as possible)
As collision-less as possible (avoid keys for two different objects to be equal)

Two alternatives that I have considered are these:

Use prefixes for namespaces, separated by some character (like human_resources:person:<some_id>).The upside of this is that it is pretty scalable and easy to understand. The downside would be possible conflicts depending on the separator (what if id has the character : in it?), and possibly size efficiency (too many nested namespaces might create very long keys).
Use some data structure (like Ordered Set or Hash) to store namespaces. The main drawback to this would be loss of "shardability", since the structure to store the namespaces would need to be in a single database.

Question: What would be a good way to manage a keyspace in a sharded setup? Should we use one these alternatives, or is there some other, better pattern that we have not considered?

Thanks very much!

637

asked Oct 09 '13 18:10

Juan Carlos Coto

1 Answers

The generally accepted convention in the Redis world is option 1 - i.e. namespaces separated by a character such as colon. That said, the namespaces are almost always one level deep. For example : person:12321 instead of human_resources:person:12321.

How does this work with the 4 guidelines you set?

Shardable - This approach is shardable. Each key can get into a different shard or same shard depending on how you set it up.

Namespaced Namespace as a way to avoid collisions works with this approach. However, namespaces as a way to group keys doesn't work out. In general, using keys as a way to group data is a bad idea. For example, what if the person moves from department to another? If you change the key, you will have to update all references - and that gets tricky.

Its best to ensure the key never changes for an object. Grouping can then be handled externally by creating a separate index.

For example, lets say you want to group people by department, by salary range, by location. Here's how you'd do it -

Individual people go in separate hash with keys persons:12321
Create a set for each group by - For example : persons_by:department - and only store the numeric identifiers for each person in this set. For example [12321, 43432]. This way, you get the advantages of Redis' Integer Set

Efficient The method explained above is pretty efficient memory wise. To save some more memory, you can compress the keys further on the application side. For example, you can store p:12321 instead of persons:12321. You should do this only if you have determined via profiling that you need such memory savings. In general, it isn't worth the cost.

Collision Free This depends on your application. Each User or Person should have a primary key that never changes. Use this in your Redis key, and you won't have collisions.

You mentioned two problems with this approach, and I will try to address them

What if the id has a colon?

It is of course possible, but your application's design should prevent it. Its best not to allow special characters in identifiers - because they will be used across multiple systems. For example, the identifier will very likely be a part of the URL, and colon is a reserved character even for urls.

If you really must allow special characters in your identifier, you would have to write a small wrapper in your code that encodes the special characters. URL encoding is perfectly capable of handling this.

Size Efficiency

There is a cost to long keys, however it isn't too much. In general, you should worry about the data size of your values rather than the keys. If you think keys are consuming too much memory, profile the database using a tool like redis-rdb-tools.

If you do determine that key size is a problem and want to save the memory, you can write a small wrapper that rewrites the keys using an alias.

answered Oct 13 '22 20:10

Sripathi Krishnan

Related questions
                            
                                Hide Database Login Information in PHP Code
                            
                                Cache oblivious lookahead array
                            
                                Database schema of messaging application
                            
                                How do you wipe a Postgresql database?
                            
                                Mongodb security in node.js
                            
                                Package and use embedded database (H2.db file) inside a Jar?
                            
                                What problems can two-phase commits cause?
                            
                                Is it okay to have a lot of database views?
                            
                                Caching Strategy for queried data
                            
                                Data in different resolutions
                            
                                Android handle app updates and database changes
                            
                                How to unit test an SQL query?
                            
                                Delphi and NoSQL
                            
                                MySQL Database column having multiple values
                            
                                mysql like% query is slow with full-text index
                            
                                Force Hibernate Insert Without Select Statements
                            
                                What are the standard/recommended ways to store version-controlled, database data?
                            
                                extract a substring from clob in oracle
                            
                                sqlalchemy: why can't I update to func.now(), but can use 'now()'?
                            
                                Hybrid DB System: NoSQL for data, SQL for relationships. Best Practice? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a good way to manage keys in a key-value store?

Tags:

namespaces

database

key

redis

key-value-store

Juan Carlos Coto

People also ask

1 Answers

Sripathi Krishnan

Recent Activity

Donate For Us