Backup cassandra keyspace using nodetool

Tags:

cassandra

I am using Cassandra on Ubuntu 14.04. From the documentation, I could see that running the command:

nodetool snapshot <keyspace name>

creates a snapshot of my keyspace.

The output of the command is:

Click to copy

nodetool snapshot my_keyspace                       
Requested creating snapshot(s) for [my_keyspace] with snapshot name [1455455429118]
Snapshot directory: 1455455429118

As per the docs, the snapshots should be present in the directories:

Click to copy

/var/lib/cassandra/data/my_keyspace/<table names>/snapshots/1455455429118

However, there is some hash value at the end of the table name.

I am not sure where that came from and also not sure whether that value will always be same or not. For example, the table name is user_agents, the snapshot directory is:

Click to copy

/var/lib/cassandra/data/my_keyspace/user_agents-147c8cc0d31c11e5aacb3b02dd594b59/snapshots/1455455429118

I am not sure what 147c8cc0d31c11e5aacb3b02dd594b59 represents.

I am trying to automate this process and if I don't know about this random hash value, it won't be possible to know which directory to pick. Is there any way to disable this or decipher this from the output of nodetool command ?

558

asked Sep 26 '22 09:09

Mandeep Singh

1 Answers

From the Documentation.

Taking a snapshot

Snapshots are taken per node using the nodetool snapshot command. To take a global snapshot, run the nodetool snapshot command using a parallel ssh utility, such as pssh.

A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. You must have enough free disk space on the node to accommodate making snapshots of your data files. A single snapshot requires little disk space. However, snapshots can cause your disk usage to grow more quickly over time because a snapshot prevents old obsolete data files from being deleted. After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place.

Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace.

Click to copy

$ nodetool -h localhost -p 7199 snapshot mykeyspace

The snapshot is created in data_directory_location/keyspace_name/table_name/snapshots/snapshot_name directory. Each snapshot directory contains numerous .db files that contain the data at the time of the snapshot.

Cassandra flushes the node before taking a snapshot, takes the snapshot, and stores the data in the snapshots directory of each keyspace in the data directory. If you do not specify the name of a snapshot directory using the -t option, Cassandra names the directory using the timestamp of the snapshot, for example 1391460334889. Follow the procedure for taking a snapshot before upgrading Cassandra. When upgrading, backup all keyspaces. For more information about snapshots, see Apache documentation .

If you did not specify a snapshot name, Cassandra names snapshot directories using the timestamp of the snapshot. If the keyspace contains no data, empty directories are not created.

Example: Single table snapshot

Take a snapshot of only the playlists table in the music keyspace. On Linux, in the Cassandra bin directory, for example:

Click to copy

$ ./nodetool snapshot -cf playlists music

Requested creating snapshot(s) for [music] with snapshot name [1391461910600]
Snapshot directory: 1391461910600

Cassandra creates the snapshot directory named 1391461910600 that contains the backup data of playlists table in

Click to copy

/var/lib/cassandra/data/music/playlists-bf8118508cfd11e3972273ded3cb6170/snapshots

nodetool <options> snapshot ( 
  ( -cf <table> | --column-family <table> ) 
  ( -t <tag> | --tag <tag> )
  -- ( <keyspace> ) | ( <keyspace> ... )
)

options are:
( -h | --host ) |
( -p | --port )
( -pw | --password )
( -u | --username )
-cf, or --column-family, followed by the name of the table to be backed up.
-t or --tag, followed by the snapshot name.
-- Separates an option and argument that could be mistaken for a option.
keyspace is one keyspace name that is required when using the -cf option, or one or more optional keyspace names, separated by a space.

UPDATE::

Click to copy

/var/lib/cassandra/data/music/playlists-bf8118508cfd11e3972273ded3cb6170/snapshots

Here in playlists-bf8118508cfd11e3972273ded3cb6170 , -bf8118508cfd11e3972273ded3cb6170 is UUID

So it is generating in that way.Ad There are some options to monitor the sstables that are being written, and incrementally backup those files.

Check out tablesnap, & cassandra snapshotter .

185

answered Oct 11 '22 13:10

Renjith V R

Related questions
                            
                                Simplest way to insert data into a fresh Cassandra database using the Hector API?
                            
                                Cassandra CLI: specify name of primary key
                            
                                pycassa TypeError: A str or unicode, unable to do cassandra insert
                            
                                Wiping Cassandra DB between tests (Rspec)
                            
                                How can I query a Cassandra cluster for its metadata?
                            
                                Cassandra convert UUID to string and back
                            
                                Cassandra distinct counting
                            
                                Designing timeseries database in Cassandra
                            
                                Murmur3 Hash Algorithm Used in Cassandra
                            
                                Cassandra nodetool could not resolve '127.0.0.1': unknown host
                            
                                How does "DROP TABLE IF EXISTS" work in Cassandra?
                            
                                Cassandra and unstructured data
                            
                                Why does Spark Cassandra Connector fail with NoHostAvailableException?
                            
                                phantom-dsl_2.11 error implicit session
                            
                                CQL with a wide row - how to get most recent set?
                            
                                Spark SQL + Cassandra: bad performance
                            
                                nodetool cfhistograms output
                            
                                Cassandra timeout cqlsh query large(ish) amount of data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Backup cassandra keyspace using nodetool

Tags:

cassandra

Mandeep Singh

People also ask

1 Answers

Taking a snapshot

Renjith V R

Recent Activity

Donate For Us