Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why kafka streams state dir is in /tmp/kafka-streams?

I'm not sure if it's already answered. As I didn't get proper explanation, posting my question here.

Why kafka streams state.dir is stored under /tmp/kafka-streams?

I know I can change the path by providing the state dir config in the stream code like below

StreamsConfig.STATE_DIR_CONFIG,"/var/abc-Streams"

But will there be any impact of changing the directory?

or

Can I configure the state DB in an application directory and not in /tmp.

As per the confluent documentation, for : Stateful operations :

automatically creates and manages such state stores when you are calling stateful operators such as count() or aggregate(), or when you are windowing a stream

but didn't specify where exactly it's being stored.

ANy thoughts?

like image 691
kuti Avatar asked Jan 27 '23 14:01

kuti


1 Answers

Why kafka streams state.dir is stored under /tmp/kafka-streams?

There are several reasons.

  1. Usually /tmp directory has a default write permission. So you don't have to struggle with write permissions as a beginner.
  2. /tmp directory is short lived directory. On each system reboot, it is cleaned, hence you don't experience the over flooded disk storage in case you forgot to delete the state.dir. Downside is, you lose the states from previous run hence you need to rebuild the states from scratch.

If you want to reuse the states stored in state.dir, you should store it somewhere except /tmp.

All the state-stores are stored in the location specified in state.dir. If not specified, it is /tmp/kafka-streams/<app-id> directory.

like image 112
Nishu Tayal Avatar answered Feb 05 '23 17:02

Nishu Tayal