Why enable Record Caches In Kafka Streams Processor API if RocksDB is buffered in memory?

Tags:

I am trying to understand RocksDB behavior in Kafka streams processor API. I am configuring a persistent StateStore using the default RocksDB that KStreams provide.

StoreBuilder countStoreBuilder =
  Stores.keyValueStoreBuilder(
    Stores.persistentKeyValueStore("Counts"),
    Serdes.String(),
    Serdes.Long())

I am not doing any aggregation, join, or windowing. I am just receiving records and comparing some of them to previous items in the store and storing some of the records I receive in the state store.

The developer guide mentions that you can enable record caches in the Processor API by calling .withCachingEnabled() on the above builder.

The cache "serves as a read cache to speed up reading data from a state store" - Record Caches Kafka Streams

However, my understanding is that RocksDB in persistent mode is first buffered in memory and will expand into disk only if the state doesn't fit in RAM.

RocksDB is just used as an internal lookup table (that is able to flush to disk if the state does not fit into memory RocksDB flushing is only required because state could be larger than available main-memory. Kafka Streams Internal Data Management

So how does record caches speed up the read from the state store if both are buffered in memory? It seems to me that record caches overlap with RocksDB behavior.

621

asked May 29 '19 16:05

iah10

1 Answers

Your observation is correct and it depends on the use case if caching is desired on not. One big advantage of application level caching (instead of RocksDB caching) is that it reduces the number of records written into the changelog topic that is used to make the store fault-tolerant. Hence, it reduced the load on the Kafka cluster and also may reduce recovery time.

For DSL users, caching also has an impact on downstream load (something you might not be interested for you application, as it seems you are using the Processor API):

https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers
https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/
https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management

answered Nov 15 '22 09:11

Matthias J. Sax

Related questions
                            
                                Send byte array to storm kafka bolt
                            
                                object kafka is not a member of package org.apache
                            
                                Join multiple Kafka topics by key
                            
                                kafka stop consuming message from new assigned partitions after rebalancing
                            
                                Kafka NotLeaderForPartitionException
                            
                                How to acknowledge consume message in kafka using php-rdkafka?
                            
                                How to deploy Kafka Streaming Application on Kafka Cluster
                            
                                How to detect duplicate messages in a kafka topic?
                            
                                Kafka Streams in docker-compose takes long time for partition assignment
                            
                                Which Kafka broker configuration gets precedence when there's a conflict?
                            
                                ClickHouse Kafka Performance
                            
                                Spark Structured Streaming app has no jobs and no stages
                            
                                Confluent 4.1.0 ->KSQL : STREAM-TABLE join -> table data null
                            
                                Kafka transaction failed but commits offset anyway
                            
                                Cannot connect to single-node Kafka server through Docker
                            
                                @KafkaListener concurrency multiple topics
                            
                                Use the same topic as a source more than once with Kafka Streams DSL
                            
                                What is the command for getting the list of Consumers of a particular topic in Kafka
                            
                                How do I stop attempting to consume messages off of Kafka when at the end of the log?
                            
                                Launching multiple Kafka brokers fails

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why enable Record Caches In Kafka Streams Processor API if RocksDB is buffered in memory?

Tags:

apache-kafka

apache-kafka-streams

rocksdb

iah10

People also ask

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us