Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search read hotkeys in aerospike cluster?

Tags:

aerospike

We have an aerospike cluster of 8 nodes. We saw that during peak hours one of the nodes is having a significantly higher load average in comparison to other nodes. Also in the AMC dashboard, we saw that the node is having only 30% read success. After following few similar issues posted in the aerospike community, we thought that the presence of hotkeys might be the possible culprit.

After following (https://discuss.aerospike.com/t/how-to-identify-read-hotkeys/4193), we found out a few hotkey digests with TCPdump in real-time. Among the top 10 digests, the interesting thing is that one key is present in 90% of the time. We then followed (https://discuss.aerospike.com/t/faq-how-keys-and-digests-are-used-in-aerospike/4663) to find out UserKey/record from those digests. We were able to map user key from all those except for one key which is present in 90% of the time.

Is there any way we can identify that hotkey?

like image 509
Sourav Paul Avatar asked Dec 02 '19 08:12

Sourav Paul


2 Answers

Depending on your version of aerospike, you can also change the logging level for rw-client module which would also print the digest in the logs. That may remove any false positive from the tcpdump method.

Turn detail level logging for rw-client context

asinfo -v "set-log:id=0;rw-client=detail"

Turn back to info

asinfo -v "set-log:id=0;rw-client=info"

Also did you try the UDF from the above article to determine the set and key? (They original key would only be stored if the client has explicitly enable the SendKEY policy). Were there any corresponding record write failures, like record too big? Or possibly trying to read a non-existing record. (read not found) The write failures from a record too big would have the most impact on your network infrastructure. In both of these cases, the digest and record would not make it to storage and digest would not match an existing record.

like image 141
lvolmar Avatar answered Nov 01 '22 14:11

lvolmar


It is possible that the frequent read request with the rouge digest may be failing with a 'not found' error (and hence only 30% read success). But Aerospike will spend its resources (CPU) to search for this digest in the index tree. If this is true, there will be no record in the database corresponding to the digest that you found via tcpdump. So, you will not get any details about that in the database. How did you identify the keys of other digests ? and what issue are you facing to find the key corresponding to the rouge digest ?.

Another option is to track back to the application. One option is to see in the tcpdump if all the requests for this rouge digest are coming from a single machine. That will narrow down your search greatly. We have seen bots creating such a mess in the past.

like image 40
sunil Avatar answered Nov 01 '22 15:11

sunil