Redis PFADD to check a exists-in-set query

Question

I have a requirement to process multiple records from a queue. But due to some external issues the items may sporadically occur multiple times. I need to process items only once

What I planned to use is PFADD into redis every record ( as a md5sum) and then see if that returns success. If that shows no increment then the record is a duplicate else process the record.

This seems pretty straightforward , but I am getting too many false positives while using PFADD

Is there a better way to do this ?

Itamar Haber · Accepted Answer

Being the probabilistic data structure that it is, Redis' HyperLogLog exhibits 0.81% standard error. You can reduce (but never get rid of) the probability for false positives by using multiple HLLs, each counting a the value of a different hash function on your record.

Also note that if you're using a single HLL there's no real need to hash the record - just PFADD as is.

Alternatively, use a Redis Set to keep all the identifiers/hashes/records and have 100%-accurate membership tests with SISMEMBER. This approach requires more (RAM) resources as you're storing each processed element, but unless your queue is really huge that shouldn't be a problem for a modest Redis instance. To keep memory consumption under control, switch between Sets according to the date and set an expiry on the Set keys (another approach is to use a single Sorted Set and manually remove old items from it by keeping their timestamp in the score).

Redis PFADD to check a exists-in-set query

Tags:

redis

Ram

1 Answers

Itamar Haber

Recent Activity

Donate For Us

Redis PFADD to check a exists-in-set query

Tags:

redis

Ram

1 Answers

Itamar Haber

Related questions

Recent Activity

Donate For Us