Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aerospike: keep data as blob or use 'bins'?

I need to keep data in Aerospike. This engine which does support 'bins' ('bin' is like column in a row or field in a record). On the other hand I can keep my records as serialized blobs. Records are extracted from from database in atomic way. That is, I don't need to fetch some 'columns' of the record, I need record entirely.

The question is: what is the most efficient way of keeping data for such scenario in terms of performance? Keep it unserialized and use 'bins' to describe all record's fields, or store it as serialized blob in 1 column?

like image 342
elgris Avatar asked Aug 06 '14 10:08

elgris


People also ask

How is data stored in Aerospike?

Aerospike can store data on any of the following types of media and combinations thereof: Dynamic Random Access Memory (DRAM). Non-volitile Memory extended (NVMe) Flash or Solid State Drive (SSD). Persistent Memory (PMEM).

What are bins in Aerospike?

bin. In the Aerospike database, each record (similar to a row in a relational database) stores data using one or more bins (like columns in a relational database). The major difference between bins and RDBMS columns is that you don't need to define a schema. Each record can have multiple bins.

What are sets in Aerospike?

An Aerospike “set” is similar to a table in a relational database. One of the big difference is that with Aerospike, you do not need to predefine a schema. Thus, sets are created dynamically and implicitly on first record insertion in set.


1 Answers

If you are sure that your only usecase is to fetch the full record, and never the individual bins, it is better to store as a single bin value. (Internally, multiple bins will need multiple mallocs beyond a size limit). Infact, you can set the namespace config option 'single-bin true' which will optimize things further. Be aware that once you set this config option it can never be unset even with a node restart. You have to clean the drives if you want to change this config. If the namespace is in-memory, obviously, this restriction is not applicable.

In the future, if there is possibility of accessing sub-set of the bins, storing as bins is better. As it will save on the network I/O which will be much bigger than the malloc overhead.

like image 122
sunil Avatar answered Oct 16 '22 00:10

sunil