Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does cassandra do during compaction?

I know that cassandra merges sstables, row-keys, remove tombstone and all.

  1. But i am really interested to know how it performs compaction ?

  2. As sstables are immutable does it copy all the relevant data to new file? and while writing to this new file it discard the tombstone marked data.

i know what compaction does but want to know how it make this happen(T)

like image 281
samarth Avatar asked Oct 03 '11 09:10

samarth


People also ask

When compaction happens in Cassandra?

Apache Cassandra compaction is the process of reconciling different data copies stored in different SSTables. Cassandra compaction of SSTables is a crucial background activity for maintenance and performance. Compaction in Cassandra involves various techniques and different timing for performing different operations.

How do I check my compaction in Cassandra?

If you grep the cassandra log file for lines containing Compacting you will find the sstables that are part of a compaction. If you sum these sizes and multiply by the inverse of your compression ratio for the column family you will get pretty close to the total.

Does Cassandra compress data?

Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of data on disk by compressing the SSTable in user-configurable compression chunk_length_in_kb .

What is anti compaction in Cassandra?

Since SSTables can contain any range, we need to split out the ranges that were actually repaired, this is called anticompaction. It means that one SSTable is split in two - one containing repaired data and one containing unrepaired data.


2 Answers

I hope this thread helps, provided if you follow all the posts and comments in it

http://comments.gmane.org/gmane.comp.db.cassandra.user/10577

AFAIK

Whenever memtable is flushed from memory to disk they are just appended[Not updated] to new SSTable created, sorted via rowkey.
SSTable merge[updation] will take place only during compaction. 
Till then read path will read from all the SSTable having that key you look up and the result from them is merged to reply back,

Two types : Minor and Major

Minor compaction is triggered automatically whenever a new sstable is being created.
May remove all tombstones
Compacts sstables of equal size in to one [initially memtable flush size] when minor compaction threshold is reached [4 by default]. 

Major Compaction is manually triggered using nodetool
Can be applied over a column family over a time
Compacts all the sstables of a CF in to 1

Compacts the SSTables and marks delete over unneeded SSTables. GC takes care of freeing up that space

Regards, Tamil

like image 84
Tamil Avatar answered Oct 14 '22 13:10

Tamil


Are two ways to run compaction :

A- Minor compaction. Run automatically. B- Major compaction. Run mannualy.

In both cases takes x files (per CF) and process them. In this process mark the rows with expired ttl as tombstones, and delete the existing tombstones. With this generates a new file. The tombostones generated in this compaction, will be delete in the next compaction (if spend the grace period, gc_grace).

The difference between A and B are the quantity of files taken and the final file. A takes a few similar files (similar size) and generate a new file. B takes ALL the files and genrate only one big file.

like image 29
Christian Agrazar Avatar answered Oct 14 '22 15:10

Christian Agrazar