I'm wondering where to store the data provided by cassandra. Something like a "data lake", where I can put all processed data, at the end of the day or something like that.
I'm looking for a distributed and reliable storage solution that can protect from losing data.
Cassandra has its file system called CFS, but where to store it?
Cassandra has built in resiliency in the form of its real-time, asynchronous replication. In most cases, having any sort of special file system outside of EXT4, ZFS, and the like can cause issues in the Cassandra world.
Most users rely on Cassandra's replication though some choose to also incorporate backups which they tend to either upload to cloud storage or separate mnt points.
If you meant that you wanted to take your data from Cassandra and store it somewhere else, like a datalake, I suggest using Spark to bulk read data out of Cassandra efficiently, then write out to flat files or to the system of you choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With