Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to structure data in Riak?

Tags:

riak

I'm trying to figure out how to model data in Riak. Let's say you are building something like a CMS with two features, news and products. You need to be able to store this information for multiple clients X and Y. How would you typically structure this?

  1. One bucket per client and then two keys news and products. Store multiple objects under each key and then use map/reduce to order them.

  2. Store both the news and the products in the same bucket, but with a new autogenerated key for each news item and product item. That is, one bucket for X and one for Y.

  3. One bucket per client/feature combination, that is, the buckets would be X-news, X-products, Y-news and Y-products. Then use map/reduce on the whole bucket to return the results in order.

Which would be the best way to handle this problem?

like image 379
Fabian Alenius Avatar asked Feb 19 '11 16:02

Fabian Alenius


2 Answers

I'd create 2 buckets: news and products. Then I'd prefix keys in each bucket with client names. I'd probably also include dates in news keys for easy date ranging.

news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01

And optionally prefix product names with category names

products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle

Then in your map/reduce you could use key filtering:

// BigCorp News items
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["starts_with", "bigcorp"]]
  }
  // ... rest of mapreduce job
}

// Acme Blacksmithing items
{
  "inputs":{
     "bucket":"products",
     "key_filters":[["starts_with", "acme_blacksmithing"]]
  }
  // ... rest of mapreduce job
}

// News for all clients from Feb 12th to 19th
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["tokenize", "_", 2],
                    ["between", "2011-02-12", "2011-02-19"]]
  }
  // ... rest of mapreduce job
}
like image 122
KevBurnsJr Avatar answered Oct 18 '22 17:10

KevBurnsJr


An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.

Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.

You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.

1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).

2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.

3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key). Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.

like image 7
Dmitri Zagidulin Avatar answered Oct 18 '22 16:10

Dmitri Zagidulin