In "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt, the following formula is used to calculate the size of a table on disk (apologies for the blurred part): <img src="https://i.stack.imgur.com/p2ZBe.png" alt="table size equation"> <ul> <li>ck: primary key columns</li> <li>cs: static columns</li> <li>cr: regular columns</li> <li>cc: clustering columns</li> <li>Nr: number of rows</li> <li>Nv: it's used for counting the total size of the timestamps (I don't get this part completely, but for now I'll ignore it).</li> </ul> There are two things I don't understand in this equation. First: why do clustering columns size gets counted for every regular column? Shouldn't we multiply it by the number of rows? It seems to me that by calculating this way, we're saying that the data in each clustering column, gets replicated for each regular column, which I suppose is not the case. Second: why do primary key columns don't get multiplied by the number of partitions? From my understanding, if we have a node with two partitions, then we should multiply the size of the primary key columns by two because we'll have two different primary keys in that node.

As the author, I greatly appreciate the question and your engagement with the material! With respect to the original questions - remember that this is not the formula to calculate the size of the table, it is the formula to calculate the size of a single partition. The intent is to use this formula with "worst case" number of rows to identify overly large partitions. You'd need to multiply the result of this equation by the number of partitions to get an estimate of total data size for the table. And of course this does not take replication into account. Also thanks to those who responded to the original question. Based on your feedback I spent some time looking at the new (3.0) storage format to see whether that might impact the formula. I agree that Aaron Morton's article is a helpful resource (link provided above). The basic approach of the formula remains sound for the 3.0 storage format. The way the formula works, you're basically adding: <ul> <li>the sizes of the partition key and static columns </li> <li>the size of the clustering columns per row, times the number of rows</li> <li>8 bytes of metadata for each cell</li> </ul> Updating the formula for the 3.0 storage format requires revisiting the constants. For example, the original equation assumes 8 bytes of metadata per cell to store a timestamp. The new format treats the timestamp on a cell as optional since it can be applied at the row level. For this reason, there is now a variable amount of metadata per cell, which could be as low as 1-2 bytes, depending on the data type. After reading this feedback and rereading that section of the chapter, I plan to update the text to add some clarifications as well as stronger caveats about this formula being useful as an approximation rather than an exact value. There are factors it doesn't account for at all such as writes being spread over multiple SSTables, as well as tombstones. We're actually planning another printing this spring (2017) to correct a few errata, so look for those changes soon.

It's because of Cassandra's version < 3 internal structure. <ul> <li>There is only one entry for each distinct partition key value.</li> <li>For each distinct partition key value there is only one entry for static column</li> <li>There is an empty entry for the clustering key </li> <li>For each column in a row there is a single entry for each clustering key column</li> </ul> Let's take an example : <pre class="prettyprint"><code>CREATE TABLE my_table ( pk1 int, pk2 int, ck1 int, ck2 int, d1 int, d2 int, s int static, PRIMARY KEY ((pk1, pk2), ck1, ck2) ); </code></pre> Insert some dummy data : <pre class="prettyprint"><code> pk1 | pk2 | ck1 | ck2 | s | d1 | d2 -----+-----+-----+------+-------+--------+--------- 1 | 10 | 100 | 1000 | 10000 | 100000 | 1000000 1 | 10 | 100 | 1001 | 10000 | 100001 | 1000001 2 | 20 | 200 | 2000 | 20000 | 200000 | 2000001 </code></pre> Internal structure will be : <pre class="prettyprint"><code> |100:1000: |100:1000:d1|100:1000:d2|100:1001: |100:1001:d1|100:1001:d2| -----+-------+-----------+-----------+-----------+-----------+-----------+-----------+ 1:10 | 10000 | | 100000 | 1000000 | | 100001 | 1000001 | |200:2000: |200:2000:d1|200:2000:d2| -----+-------+-----------+-----------+-----------+ 2:20 | 20000 | | 200000 | 2000000 | </code></pre> So size of the table will be : <pre class="prettyprint"><code>Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte Estimated Table Size = Single Partition Size * Number Of Partition = 68 * 2 byte = 136 byte </code></pre> <ul> <li>Here all of the field type is int (4 byte)</li> <li>There is 4 primary key column, 1 static column, 2 clustering key column and 2 regular column</li> </ul> More : http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

Calculating the size of a table in Cassandra

Tags:

cassandra

In "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt, the following formula is used to calculate the size of a table on disk (apologies for the blurred part):

table size equation

ck: primary key columns
cs: static columns
cr: regular columns
cc: clustering columns
Nr: number of rows
Nv: it's used for counting the total size of the timestamps (I don't get this part completely, but for now I'll ignore it).

There are two things I don't understand in this equation.

First: why do clustering columns size gets counted for every regular column? Shouldn't we multiply it by the number of rows? It seems to me that by calculating this way, we're saying that the data in each clustering column, gets replicated for each regular column, which I suppose is not the case.

Second: why do primary key columns don't get multiplied by the number of partitions? From my understanding, if we have a node with two partitions, then we should multiply the size of the primary key columns by two because we'll have two different primary keys in that node.

394

asked Mar 11 '17 13:03

Pedro Gordo

2 Answers

As the author, I greatly appreciate the question and your engagement with the material!

With respect to the original questions - remember that this is not the formula to calculate the size of the table, it is the formula to calculate the size of a single partition. The intent is to use this formula with "worst case" number of rows to identify overly large partitions. You'd need to multiply the result of this equation by the number of partitions to get an estimate of total data size for the table. And of course this does not take replication into account.

Also thanks to those who responded to the original question. Based on your feedback I spent some time looking at the new (3.0) storage format to see whether that might impact the formula. I agree that Aaron Morton's article is a helpful resource (link provided above).

The basic approach of the formula remains sound for the 3.0 storage format. The way the formula works, you're basically adding:

the sizes of the partition key and static columns
the size of the clustering columns per row, times the number of rows
8 bytes of metadata for each cell

Updating the formula for the 3.0 storage format requires revisiting the constants. For example, the original equation assumes 8 bytes of metadata per cell to store a timestamp. The new format treats the timestamp on a cell as optional since it can be applied at the row level. For this reason, there is now a variable amount of metadata per cell, which could be as low as 1-2 bytes, depending on the data type.

After reading this feedback and rereading that section of the chapter, I plan to update the text to add some clarifications as well as stronger caveats about this formula being useful as an approximation rather than an exact value. There are factors it doesn't account for at all such as writes being spread over multiple SSTables, as well as tombstones. We're actually planning another printing this spring (2017) to correct a few errata, so look for those changes soon.

answered Sep 24 '22 21:09

Jeff Carpenter

It's because of Cassandra's version < 3 internal structure.

There is only one entry for each distinct partition key value.
For each distinct partition key value there is only one entry for static column
There is an empty entry for the clustering key
For each column in a row there is a single entry for each clustering key column

Let's take an example :

CREATE TABLE my_table (
    pk1 int,
    pk2 int,
    ck1 int,
    ck2 int,
    d1 int,
    d2 int,
    s int static,
    PRIMARY KEY ((pk1, pk2), ck1, ck2)
);

Insert some dummy data :

 pk1 | pk2 | ck1 | ck2  | s     | d1     | d2
-----+-----+-----+------+-------+--------+---------
   1 |  10 | 100 | 1000 | 10000 | 100000 | 1000000
   1 |  10 | 100 | 1001 | 10000 | 100001 | 1000001
   2 |  20 | 200 | 2000 | 20000 | 200000 | 2000001

Internal structure will be :

             |100:1000:  |100:1000:d1|100:1000:d2|100:1001:  |100:1001:d1|100:1001:d2|  
-----+-------+-----------+-----------+-----------+-----------+-----------+-----------+
1:10 | 10000 |           |  100000   |  1000000  |           |  100001   |  1000001  |


             |200:2000:  |200:2000:d1|200:2000:d2|
-----+-------+-----------+-----------+-----------+ 
2:20 | 20000 |           |  200000   |  2000000  |

So size of the table will be :

Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte

Estimated Table Size = Single Partition Size * Number Of Partition 
                     = 68 * 2 byte
                     = 136 byte

Here all of the field type is int (4 byte)
There is 4 primary key column, 1 static column, 2 clustering key column and 2 regular column

More : http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

answered Sep 25 '22 21:09

Ashraful Islam

Related questions
                            
                                why is scaling writes to a relational database virtually impossible?
                            
                                What is the best way to distribute postgresql
                            
                                Looking for a basic and up-to-date Cassandra tutorial [closed]
                            
                                Inserting arbitrary columns in Cassandra using CQL3
                            
                                Master-less model in Cassandra vs master-slave model in MongoDB?
                            
                                Hadoop on cassandra database
                            
                                Is there a stable Cassandra library for Erlang?
                            
                                How to remove dead node out of the Cassandra cluster?
                            
                                Cassandra Non-Counter Family
                            
                                How to make a post request with the Python requests library?
                            
                                Which database to choose (Cassandra, MongoDB, ?) for storing and querying event / log / metrics data?
                            
                                Select 2000 most recent log entries in cassandra table using CQL (Latest version)
                            
                                NoNodeAvailableException: No node was available to execute the query
                            
                                cassandra cql delete using a less than operator on a secondary key
                            
                                Cassandra Compaction vs Repair vs Cleanup
                            
                                How to add columns dynamically in a column family in cassandra using cql
                            
                                How does the Leveled Compaction Strategy ensure 90% of reads are from one sstable
                            
                                Timestamp comparison in cassandra
                            
                                How to insert a datetime into a Cassandra 1.2 timestamp column
                            
                                Just set the TTL on a row

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With