Memtable understanding

Question

I have some questions about cassandra memtable. I'll be grateful for the help.

Facts about memtable:

1) placed in RAM;

2) per-ColumnFamily structure;

3) multiple memtables may exist for a single column family;

Questions:

1) When additional memtable for column family are created? What condition is need? I assume that additional memtables are created after creation additional commit log file. This is true?

2) What occurs after reaching commit log size threshold? I assume that will start placing memtables to queue; after queue filling will start flushing memtable to sstable and after this the older commit log (on hdd) and corresponding memtables (in ram) are removed. In this case some part of memtable memory will be alway empty and commit log always will be filled up to 90-100%?

3) What occurs when the memtable size threshold is reached? Like previous case will start flushing to sstable? Some part of commit log also will be always empty and memtable memory will be filled up to 90-100%?

4) About memtable_allocation_type: in official recources - "offheap_buffers moves the cell name and value to DirectBuffer objects. This has the lowest impact on reads — the values are still “live” Java buffers — but only reduces heap significantly when you are storing large strings or blobs.". What does DirectBuffer mean? It is placed in java heap? Can you give links to websites with information about it?

thank you very much!

G Quintana · Accepted Answer

For a given Column Family there is usually a single Memtable in memory except during special circumstances like repair process or pending flushes.
When the Commit Log is full, a flush is triggered: the Memtable is written to disk as a SSTable, then the Memtable is cleared and the Commit Log is recycled. A new cycle starts with an empty Commit Log/Memtable
When Memtable exceed a given size, a flush is triggered like above.
Usually Memtable is kept in Java heap memory by default. As of Cassandra 2.1, Memtable can be stored outside the Java Heap to alleviate GC pressure. However this setting is an optimisation for some special case. Cassandra can store data outside the Java heap using JNA, this means this data is not eligible to garbage collection because it is not known of the JVM. However the Java objects must be transformed to be stored/retrieved in/from it. This is why these Java objects are not considered as "alive".

I advise you to watch https://academy.datastax.com/courses/learning-cassandra-write-path

Memtable understanding

Tags:

nosql

cassandra

bissquit

1 Answers

G Quintana

Recent Activity

Donate For Us

Memtable understanding

Tags:

nosql

cassandra

bissquit

1 Answers

G Quintana

Related questions

Recent Activity

Donate For Us