What is the byte size of common Cassandra data types - To be used when calculating partition disk usage?

Tags:

I am trying to calculate the the partition size for each row in a table with arbitrary amount of columns and types using a formula from the Datastax Academy Data Modeling Course.

In order to do that I need to know the "size in bytes" for some common Cassandra data types. I tried to google this but I get a lot of suggestions so I am puzzled.

The data types I would like to know the byte size of are:

A single Cassandra TEXT character (I googled answers from 2 - 4 bytes)
A Cassandra DECIMAL
A Cassandra INT (I suppose it is 4 bytes)
A Cassandra BIGINT (I suppose it is 8 bytes)
A Cassandra BOOELAN (I suppose it is 1 byte, .. or is it a single bit)

Any other considerations would of course also be appreciated regarding data types sizes in Cassandra.

Adding more info since it seems confusing to understand that I am only trying to estimate the "worst scenario disk usage" the data would occupy with out any compressions and other optimizations done by Cassandra behinds the scenes.

I am following the Datastax Academy Course DS220 (see link at end) and implement the formula and will use the info from answers here as variables in that formula.

https://academy.datastax.com/courses/ds220-data-modeling/physical-partition-size

768

asked Oct 17 '16 13:10

nicgul

2 Answers

I think, from a pragmatic point of view, that it is wise to get a back-of-the-envelope estimate of worst case using the formulae in the ds220 course up-front at design time. The effect of compression often varies depending on algorithms and patterns in the data. From ds220 and http://cassandra.apache.org/doc/latest/cql/types.html:

uuid: 16 bytes
timeuuid: 16 bytes
timestamp: 8 bytes
bigint: 8 bytes
counter: 8 bytes
double: 8 bytes
time: 8 bytes
inet: 4 bytes (IPv4) or 16 bytes (IPV6)
date: 4 bytes
float: 4 bytes
int 4 bytes
smallint: 2 bytes
tinyint: 1 byte
boolean: 1 byte (hopefully.. no source for this)
ascii: equires an estimate of average # chars * 1 byte/char
text/varchar: requires an estimate of average # chars * (avg. # bytes/char for language)
map/list/set/blob: an estimate

hope it helps

answered Sep 22 '22 19:09

James Fremen

The only reliable way to estimate the overhead associated to something is to actually perform measures. Really, you can't take the single data types and generalize something about them. If you have 4 bigints columns and you're supposing that your overhead is X, if you have 400 bigint columns your overhead won't probably be 100x. That's because Cassandra compresses (by default, and it's a settings tunable per column family) everything before storing data on disk.

Try to load some data, I mean production data, in the cluster, and then let's know your results and compression configuration. You'd find some surprises.

Know your data.

answered Sep 22 '22 19:09

xmas79

Related questions
                            
                                How Cassandra handles concurrent updates?
                            
                                Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed]
                            
                                What is the maximum length of data passed to cassandra column
                            
                                What are the implications of using lightweight transactions?
                            
                                Cassandra instead of MySQL for social networking app
                            
                                How to export data from Cassandra cluster and import in another
                            
                                Cassandra : Batch write optimisation
                            
                                Row Inserts having same primary key, are replacing previous writes in Cassandra
                            
                                How can i start Apache Cassandra as a service?
                            
                                Async writes seem to be broken in Cassandra
                            
                                How to find the total space occupied by a cassandra keyspace?
                            
                                Upgrading Cassandra without losing the current data
                            
                                alllow filtering, data modeling in cql
                            
                                nodetool removenode stuck during removal
                            
                                Cassandra .csv import error:batch too large
                            
                                Alter cassandra column family primary key using cassandra-cli or CQL
                            
                                Can a Cassandra / CQL3 column family have a composite partition key?
                            
                                Cassandra Vnodes and token Ranges
                            
                                Datastax cassandra-driver (python) failed import
                            
                                How to add multiple columns in cassandra table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the byte size of common Cassandra data types - To be used when calculating partition disk usage?

Tags:

cassandra

cql

datastax

nicgul

People also ask

2 Answers

James Fremen

xmas79

Recent Activity

Donate For Us