Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to implement batch reads in aerospike

Tags:

java

aerospike

I have a scenario where for each request, I've to make a batch get of atleast 1000 keys.

Currently I'm getting 2000 requests per minute and this is expected to rise.

Also I've read that batch get of aerospike internally makes individual request to server concurrently/sequentially.

I am using the aerospike as a cluster (running on SSD). So is this efficient to write UDF (user defined method) in lua for making a batch request, and aggregating the results at server level instead of multiple hits from client

Kindly suggest if default batch get of aerospike will be efficient or I've to do something else.

like image 380
munish Avatar asked Feb 05 '18 12:02

munish


2 Answers

Batch read is the right way to do it. Results are returned in the order of keys specified in the list. Records not found will return null. Client parallel-izes the keys by nodes - waits (there is no callback in client unlike Secondary Index or Scan) and collects the returns from all nodes and presents them back in the client in original order. Make sure you have adequate memory in the client to hold all the returned batch results.

like image 112
pgupta Avatar answered Sep 21 '22 10:09

pgupta


To UDF or Not to UDF?

First thing, you cannot do batch reads as a UDF, at least not in any way that's remotely efficient.

You have two kinds of UDF. The first is a record UDF, which is limited to operating on a single record. The record is locked as your UDF executes, so it can either read or modify the data, but it is sandboxed from accessing other records. The second is a stream UDF, which is read-only, and runs against either a query or a full scan of a namespace or set. Its purpose is to allow you to implement aggregations. Even if you're retrieving 1000 keys at a time, using stream UDFs to just pick a batch of keys from a much larger set or namespace is very inefficient. That aside, UDFs will always be slower than the native operations provided by Aerospike, and this is true for any database.

Batch Reads

Read the documentation for batch operations, and specifically the section on the batch-index protocol. There is a great pair of FAQs in the community forum you should read:

  • FAQ - Differences between getting single record versus batch
  • FAQ - batch-index tuning parameters

Capacity Planning

Finally, if you are getting 2000 requests per-second at your application, and each of those turns into a batch-read of 1000 keys, you need to make sure that your cluster is sized properly to handle 2000 * 1000 = 2Mtps reads. Tuning the batch-index parameters will help, but if you don't have enough aggregate SSD capacity to support those 2 million reads per-second, your problem is one of capacity planning.

like image 21
Ronen Botzer Avatar answered Sep 21 '22 10:09

Ronen Botzer