Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redis Streams vs Kafka Streams/NATS

Tags:

Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.

In streams intro we have comparison with Kafka streams:

  1. Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
  2. Redis streams much faster. They stored and operated from memory so this one is as is case.

We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:

  1. Store all data in memory with AOF replication.
  2. By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
  3. Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.

And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.

We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.

Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?

like image 360
Nick Bondarenko Avatar asked Oct 25 '18 17:10

Nick Bondarenko


People also ask

Is Redis a good alternative to Kafka/Nats?

Redis streams much faster. They stored and operated from memory so this one is as is case. We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative.

What is a Redis Stream?

A Redis stream is conceptually equivalent to a single partition of a Kafka topic described above, with small differences: It is a persistent, ordered store of events (same as in Kafka) It has a configurable maximum length (vs. a retention period in Kafka) Events store keys and values, like a Redis Hash (vs. a single key and value in Kafka)

What is the difference between Kafka and Redis events?

Events store keys and values, like a Redis Hash (vs. a single key and value in Kafka) The major difference is that consumer groups in Redis are nothing like consumer groups in Kafka. In Redis, a consumer group is a set of processes all reading from the same stream. Redis ensures that events will only be delivered to one consumer in the group.

What happens to Redis when a consumer fails?

For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers). Redis streams much faster. They stored and operated from memory so this one is as is case. We have some project with Kafka, RabbitMq and NATS.


1 Answers

long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.

Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.

like image 198
Itamar Haber Avatar answered Oct 21 '22 12:10

Itamar Haber