Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete a large number of rows in Cassandra (and avoid potential tombstone issues)?

Oversimplifying the data model, we have the following tables:

CREATE TABLE storage (
    id timeuuid,
    foo blob,
    bar blob,
    baz blob,
    data blob,
    PRIMARY KEY ((id))
);

CREATE TABLE storage_idx_by_foo (
    foo blob,
    id timeuuid,
    PRIMARY KEY ((foo), id)
);

CREATE TABLE storage_idx_by_bar (
    bar blob,
    id timeuuid,
    PRIMARY KEY ((bar), id)
);

CREATE TABLE storage_idx_by_baz (
    baz blob,
    id timeuuid,
    PRIMARY KEY ((baz), id)
);

The first table can contain hundreds of millions of records, and we use the index tables to easily locate data based on some queryable parameters.

The problem comes when we have to purge data based on either foo, bar or baz. We have to delete the entry from the storage table and all the index tables. So, assuming we delete by e.g. foo, steps taken are:

  1. Find the id based on the appropriate index table (in this case storage_idx_by_foo)
  2. Get the bar and baz and delete the record from the storage table
  3. Delete records from the remaining two index tables (we have bar/baz and id)

Step number 3 is a problem because of tombstones - if we delete millions of records from the two remaining index tables (meaning not by partition), Cassandra will create millions of tombstones which will cause a lot of headaches when reading data before compaction happens.

Some quick brainstorming suggests we can:

  1. Force a compaction after the purging process
  2. Not delete from those two tables and handle index entries pointing to non-existing stuff in code
  3. ????

What is the suggested approach? I am guessing other Cassandra users have encountered this issue as well, but I couldn't find any advice online other than "you are doing Cassandra wrong". I don't think we could have modelled our data differently to avoid this problem (or if we could, I would appreciate feedback on that as well).

Currently, we are leaning towards option number 2, although I do not like the idea of garbage being left in the database.

like image 713
Miloš Ranđelović Avatar asked Dec 29 '14 03:12

Miloš Ranđelović


People also ask

How many tombstones will be encountered in a Cassandra query?

Whether the tombstones will be encountered depends on the way the data is stored and retrieved. For example, if Cassandra is used to store data in a queue (which is not recommended), queries may encounter tens of thousands of tombstones to return a few rows of data. How can I diagnose tombstone-related problems?

How does a delete work in Cassandra sstables?

A delete does nothing more than insert a tombstone. When Cassandra reads the data it will merge all the shards of the requested rows from the memtable and the SSTables. It then applies a Last Write Wins (LWW) algorithm to choose what is the correct data, no matter if it is a standard value or a tombstone.

How do I recover deleted data in Cassandra?

In Cassandra, deleted data is not immediately purged from the disk. Instead, Cassandra writes a special value, known as a tombstone, to indicate that data has been deleted. Tombstones prevent deleted data from being returned during reads, and will eventually allow the data to be dropped via compaction.

What happens to the data when you delete a tombstone?

If the tombstone was replicated correctly on all the replicas, we would have a fully consistent delete and no data will reappear. At this point we also free some more disk space and make it easier to read other values, even if my example is a bit silly for demonstrating this purpose as the table is now fully empty.


1 Answers

"You might be doing cassandra wrong"!!

What are your queries? Trying to generalise without knowing queries usually results in poor models in Cassandra. Modelling should really be query driven. Even if you don't know the exact queries, you should know the kinds of queries (i.e. what are you indexing on, etc.).

If you know you're going to index on foo, bar, and baz think of whether or not you can add some constraint that can serve as the partition key. For the schema you've outlined, one major problem you'll run in to is if there are tonnes of entries for the same foo value (or bar value or baz value). While in theory, a partition can be quite large, having partitions larger than tens of megs or a hundred or so megs is going to be bad for performance. As such, when doing wide rows, think of how you can limit wide row size. If each foo or bar or baz has a few hundred to a few thousand entries, this won't be an issue. Otherwise, you're asking for trouble. In this case, you may want to add some for of bucketing. For example, see if you can limit your queries to "get me data for foo x on this date" or "get me data for foo x for this country / postcode / etc.". This will prevent monstrous wide rows.

Another problem with manual indexing is that index updates are not atomic, and the index may be on a different node to the actual data. If you can limit the queries to buckets, your schema could look like:

CREATE TABLE storage (
    some_bucket text,
    id timeuuid,
    foo blob,
    bar blob,
    baz blob,
    data blob,
    PRIMARY KEY (somebucket, id)
);

or even you could leave storage as is and have your index as:

CREATE TABLE storage (
    bucket text,
    foo blob,
    bar blob,
    baz blob,
    data blob,
    PRIMARY KEY (bucket)
);

In both cases, you'd then create a cassandra secondary index on foo, bar and baz. That will allow your queries. Remeber, when using a secondary index, always hit a partition first - otherwise it becomes a cluster wide query that'll likely timeout. In Cassandra 3.0, a feature called global indexes is coming, which aims to alleviate the need for hitting a partition first, but until then, hit partition + secondary index and your queries will be fast.

Now...on to the topic of tombstones. Cassandra deletes will use tombstones. There's no way around that. Any LSM db will need compaction, and tombstones are cassandra's mechanism of achieving steady write throughput (almost) regardless of load. There are some things you can do though. If you can limit when such massive deletes will take place, you can use nodetool to disable autocompaction:

http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsDisableAutoCompaction.html

You can then do the purge, and then force a compaction:

http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCompact.html

And enable autocompaction again.

This is obviously not "neat", but will work if you have large amounts of data you're deleting from a table, but not all of it.

Hope that helps.

like image 135
ashic Avatar answered Jan 04 '23 12:01

ashic