We are looking at using Cassandra to store a stream of information coming from various sources.
One issue we are facing is the best way to query between two dates.
For example we will need to retrieve an object between datetime dt1 and datetime dt2.
We are currently considering the created unix timestamp as the key pointing to the actual object then using get_key_range to query to retrieve?
Obviously this wouldn't work if two items have the same timestamp.
Is this the best way to do datetime in noSQL stores in general?
Cassandra, with its distributed architecture, was a natural choice, and by 2013, most of Netflix's data was housed there, and Netflix still uses Cassandra today.
Cassandra is great for storing and querying large amounts of high-performance data which is why it's often used in IoT analytics and real-time data analytics use cases. You want your analytics platform to leverage and build on the strength of your Cassandra implementation. With Knowi, that is precisely what you get.
Storing time series data. Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale.
Cassandra rows can be very large, so consider modeling it as columns in a row rather than rows in a CF; then you can use the column slice operations, which are faster than row slices. If there are no "natural" keys associated with this then you can use daily or hourly keys like "2010/02/08 13:00".
Otherwise, yes, using range queries (get_key_range is deprecated in 0.5; use get_range_slice) is your best option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With