Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best data storage filesystem to use with apache cassandra?

I'm wondering where to store the data provided by cassandra. Something like a "data lake", where I can put all processed data, at the end of the day or something like that.

I'm looking for a distributed and reliable storage solution that can protect from losing data.

Cassandra has its file system called CFS, but where to store it?

like image 812
DTodt Avatar asked Feb 05 '23 16:02

DTodt


1 Answers

Cassandra has built in resiliency in the form of its real-time, asynchronous replication. In most cases, having any sort of special file system outside of EXT4, ZFS, and the like can cause issues in the Cassandra world.

Most users rely on Cassandra's replication though some choose to also incorporate backups which they tend to either upload to cloud storage or separate mnt points.

If you meant that you wanted to take your data from Cassandra and store it somewhere else, like a datalake, I suggest using Spark to bulk read data out of Cassandra efficiently, then write out to flat files or to the system of you choice.

like image 143
MarcintheCloud Avatar answered Feb 07 '23 05:02

MarcintheCloud