Columnar storage: Cassandra vs Redshift

Tags:

How is columnar storage in the context of a NoSQL database like Cassandra different from that in Redshift. If Cassandra is also a columnar storage then why isn't it used for OLAP applications like Redshift?

945

asked Oct 10 '18 11:10

p0712

1 Answers

The storage engines of Cassandra and Redshift are very different, and are created for different cases. Cassandra's storage not really "columnar" in wide known meaning of this type of databases, like Redshift, Vertica etc, it is much more closer to key-value family in NoSQL world. The SQL syntax used in Cassandra is not any ANSI SQL, and it has very limited set of queries that can be ran there. Cassandra's engine built for fast writing and reading of records, based on key, while Redshift's engine is built for fast aggregations (MPP), and has wide support for analytical queries, and stores,encodes and compresses data on column level.

It can be easily understood with following example:

Suppose we have a table with user id and many metrics (for example weight, height, blood pressure etc...). I we will run aggregate the query in Redshift, like average weight, it will do the following (in best scenario):

Master will send query to nodes.
Only the data for this specific column will be fetched from storage.
The query will be executed in parallel on all nodes.
Final result will be fetched to master.

Running same query in Cassandra, will result in scan of all "rows", and each "row" can have several versions, and only the latest should be used in aggregation. If you familiar with any key-value store (Redis, Riak, DynamoDB etc..) it is less effective than scanning all keys there.

Cassandra many times used for analytical workflows with Spark, acting as a storage layer, while Spark acting as actual query engine, and basically shouldn't be used for analytical queries by its own. With each version released more and more aggregation capabilities are added, but it is very far from being real analytical database.

160

answered Sep 30 '22 07:09

nevsv

Related questions
                            
                                Is it possible to insert/write data without defining columns in Cassandra?
                            
                                Cassandra table synchronization
                            
                                How to Use Apache Drill with Cassandra
                            
                                Fetching Cassandra row keys
                            
                                Using mahout and hadoop
                            
                                Column family stores vs document stores
                            
                                How to output to file from cassandra client?
                            
                                cql3 query with more than 1 EQ restriction and ORDER BY
                            
                                Selecting timeuuid columns corresponding to a specific date
                            
                                CQL: Bad Request: Missing CLUSTERING ORDER for column
                            
                                Are there any performance penalties when using a TEXT as a Primary Key?
                            
                                Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
                            
                                Is using IP address as primary key a good practice in scylla db?
                            
                                Massive Database w/ Fulltext Search - Sphinx, Lucene, Cassandra, MongoDB, CouchDB [closed]
                            
                                Cassandra Read a negative frame size
                            
                                Spark with Cassandra input/output
                            
                                specify cqlsh output timezone
                            
                                "All host(s) tried for query failed" Error
                            
                                Check if JNA is enabled in Cassandra
                            
                                What are best practices for backing up a cassandra cluster?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Columnar storage: Cassandra vs Redshift

Tags:

cassandra

amazon-redshift

column-oriented

p0712

People also ask

1 Answers

nevsv

Recent Activity

Donate For Us