Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backup cassandra keyspace using nodetool

Tags:

cassandra

I am using Cassandra on Ubuntu 14.04. From the documentation, I could see that running the command:

nodetool snapshot <keyspace name> 

creates a snapshot of my keyspace.

The output of the command is:

nodetool snapshot my_keyspace                       
Requested creating snapshot(s) for [my_keyspace] with snapshot name [1455455429118]
Snapshot directory: 1455455429118

As per the docs, the snapshots should be present in the directories:

/var/lib/cassandra/data/my_keyspace/<table names>/snapshots/1455455429118

However, there is some hash value at the end of the table name.

I am not sure where that came from and also not sure whether that value will always be same or not. For example, the table name is user_agents, the snapshot directory is:

/var/lib/cassandra/data/my_keyspace/user_agents-147c8cc0d31c11e5aacb3b02dd594b59/snapshots/1455455429118

I am not sure what 147c8cc0d31c11e5aacb3b02dd594b59 represents.

I am trying to automate this process and if I don't know about this random hash value, it won't be possible to know which directory to pick. Is there any way to disable this or decipher this from the output of nodetool command ?

like image 558
Mandeep Singh Avatar asked Sep 26 '22 09:09

Mandeep Singh


People also ask

How do I incremental backup in Cassandra?

By default, incremental backup is disabled in Cassandra. This can be enabled by changing the value of “incremental_backups” to “true” in the cassandra. yaml file. Once enabled, Cassandra creates a hard link to each memtable flushed to SSTable to a backup's directory under the keyspace data directory.


1 Answers

From the Documentation.

Taking a snapshot

Snapshots are taken per node using the nodetool snapshot command. To take a global snapshot, run the nodetool snapshot command using a parallel ssh utility, such as pssh.

A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. You must have enough free disk space on the node to accommodate making snapshots of your data files. A single snapshot requires little disk space. However, snapshots can cause your disk usage to grow more quickly over time because a snapshot prevents old obsolete data files from being deleted. After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place.

Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace.

$ nodetool -h localhost -p 7199 snapshot mykeyspace

The snapshot is created in data_directory_location/keyspace_name/table_name/snapshots/snapshot_name directory. Each snapshot directory contains numerous .db files that contain the data at the time of the snapshot.

Cassandra flushes the node before taking a snapshot, takes the snapshot, and stores the data in the snapshots directory of each keyspace in the data directory. If you do not specify the name of a snapshot directory using the -t option, Cassandra names the directory using the timestamp of the snapshot, for example 1391460334889. Follow the procedure for taking a snapshot before upgrading Cassandra. When upgrading, backup all keyspaces. For more information about snapshots, see Apache documentation .

If you did not specify a snapshot name, Cassandra names snapshot directories using the timestamp of the snapshot. If the keyspace contains no data, empty directories are not created.

Example: Single table snapshot

Take a snapshot of only the playlists table in the music keyspace. On Linux, in the Cassandra bin directory, for example:

$ ./nodetool snapshot -cf playlists music

Requested creating snapshot(s) for [music] with snapshot name [1391461910600]
Snapshot directory: 1391461910600

Cassandra creates the snapshot directory named 1391461910600 that contains the backup data of playlists table in

/var/lib/cassandra/data/music/playlists-bf8118508cfd11e3972273ded3cb6170/snapshots

nodetool <options> snapshot ( 
  ( -cf <table> | --column-family <table> ) 
  ( -t <tag> | --tag <tag> )
  -- ( <keyspace> ) | ( <keyspace> ... )
)
  • options are:

  • ( -h | --host ) |

  • ( -p | --port )
  • ( -pw | --password )
  • ( -u | --username )

  • -cf, or --column-family, followed by the name of the table to be backed up.

  • -t or --tag, followed by the snapshot name.

  • -- Separates an option and argument that could be mistaken for a option.

  • keyspace is one keyspace name that is required when using the -cf option, or one or more optional keyspace names, separated by a space.

UPDATE::

/var/lib/cassandra/data/music/playlists-bf8118508cfd11e3972273ded3cb6170/snapshots

Here in playlists-bf8118508cfd11e3972273ded3cb6170 , -bf8118508cfd11e3972273ded3cb6170 is UUID


So it is generating in that way.Ad There are some options to monitor the sstables that are being written, and incrementally backup those files.

Check out tablesnap, & cassandra snapshotter .

like image 185
Renjith V R Avatar answered Oct 11 '22 13:10

Renjith V R