Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

riak backup solution for a single bucket

What are your recommendations for solutions that allow backing up [either by streaming or snapshot] a single riak bucket to a file?

like image 490
user650842 Avatar asked Mar 28 '11 11:03

user650842


2 Answers

Backing up just a single bucket is going to be a difficult operation in Riak.

All of the solutions will boil down to the following two steps:

  1. List all of the objects in the bucket. This is the tricky part, since there is no "manifest" or a list of contents of any bucket, anywhere in the Riak cluster.

  2. Issue a GET to each one of those objects from the list above, and write it to a backup file. This part is generally easy, though for maximum performance you want to make sure you're issuing those GETs in parallel, in a multithreaded fashion, and using some sort of connection pooling.

As far as listing all of the objects, you have one of three choices.

One is to do a Streaming List Keys operation on the bucket via HTTP (e.g. /buckets/bucket/keys?keys=stream) or Protocol Buffers -- see http://docs.basho.com/riak/latest/dev/references/http/list-keys/ and http://docs.basho.com/riak/latest/dev/references/protocol-buffers/list-keys/ for details. Under no circumstances should you do a non-streaming regular List Keys operation. (It will hang your whole cluster, and will eventually either time out or crash once the number of keys grows large enough).

Two is to issue a Secondary Index (2i) query to get that object list. See http://docs.basho.com/riak/latest/dev/using/2i/ for discussion and caveats.

And three would be if you're using Riak Search and can retrieve all of the objects via a single paginated search query. (However, Riak Search has a query result limit of 10,000 results, so, this approach is far from ideal).

For an example of a standalone app that can backup a single bucket, take a look at Riak Data Migrator, an experimental Java app that uses the Streaming List Keys approach combined with efficient parallel GETs.

like image 165
Dmitri Zagidulin Avatar answered Dec 06 '22 23:12

Dmitri Zagidulin


The Basho function contrib has an erlang solution for backing up a single bucket. It is a custom function but it should do the trick.

http://contrib.basho.com/bucket_exporter.html

like image 38
Adam Avatar answered Dec 06 '22 23:12

Adam