In Apache Pulsar topic documentation it says can we set a topic time retention policy to -1 for infinite time based retention, What are the downsides of having infinite retention and can we use pulsar as message store where data lives forever in topics and build event sourcing application around them?
The downside is that your data will grow forever. However, due to the segment based architecture of the underlying storage (bookkeeper), more space can by added by adding storage nodes (i.e. all the data doesn't have to fit on one machine, as is the case in some other systems).
The segment based architecture also makes it fairly straightforward to move data to a bulk storage system (s3 or something) while still having it available from Pulsar. However, this is still in earlier stages of discussion right now.
Actually, you can and should use Pulsar's Tiered Storage option to offload your older data to more cost effective storage such as S3, Google Blob Storage, or HDFS. Unlike Kafka, Pulsar has decoupled the serving layers from the storage layers, which allows this. In Kafka, you would have to "add hard drives endlessly" and broker instances to store them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With