Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downsides of storing binary data in Riak?

What are the problems, if any, of storing binary data in Riak?

Does it effect the maintainability and performance of the clustering?

What would the performance differences be between using Riak for this rather than a distributed file system?

like image 546
mikeal Avatar asked May 23 '11 20:05

mikeal


People also ask

What is the advantage of the binary format for storing numbers What is the disadvantages?

Keeping the binary data in the database and returning it in the record set is simpler. A disadvantage is that your database will now require more storage space and this will impact backups and other things. Sometimes having large amounts of binary data in a database can also have an impact on performance.

Why do we store data in binary?

To make sense of complicated data, your computer has to encode it in binary. Binary is a base 2 number system. Base 2 means there are only two digits—1 and 0—which correspond to the on and off states your computer can understand.

How do you store binary data?

Binary data can be stored in a table using the data type bytea or by using the Large Object feature which stores the binary data in a separate table in a special format and refers to that table by storing a value of type oid in your table.


2 Answers

Adding to @Oscar-Godson's excellent answer, you're likely to experience problems with values much larger than 50MBs. Bitcask is best suited for values that are up to a few KBs. If you're storing large values, you may want to consider alternative storage backends, such as innostore.

I don't have experience with storing binary values, but we've a medium-sized cluster in production (5 nodes, on the order of 100M values, 10's of TBs) and we're seeing frequent errors related to inserting and retrieving values that are 100's of KBs in size. Performance in this case is inconsistent - some times it works, others it doesn't - so if you're going to test, test at scale.

We're also seeing problems with large values when running map-reduce queries - they simply time out. However that may be less relevant to binary values... (as @Matt-Ranney mentioned).

Also see @Stephen-C's answer here

like image 136
Elad Avatar answered Oct 03 '22 16:10

Elad


The only problem I can think of is storing binary data larger than 50MBs which they advise against. The whole point of Riak is just that:

Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way — it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML.

Source: Schema Design in Riak - Introduction

like image 43
Oscar Godson Avatar answered Oct 03 '22 14:10

Oscar Godson