I am using Cassandra on Ubuntu 14.04. From the documentation, I could see that running the command:
nodetool snapshot <keyspace name>
creates a snapshot of my keyspace.
The output of the command is:
nodetool snapshot my_keyspace
Requested creating snapshot(s) for [my_keyspace] with snapshot name [1455455429118]
Snapshot directory: 1455455429118
As per the docs, the snapshots should be present in the directories:
/var/lib/cassandra/data/my_keyspace/<table names>/snapshots/1455455429118
However, there is some hash value at the end of the table name.
I am not sure where that came from and also not sure whether that value will always be same or not. For example, the table name is user_agents, the snapshot directory is:
/var/lib/cassandra/data/my_keyspace/user_agents-147c8cc0d31c11e5aacb3b02dd594b59/snapshots/1455455429118
I am not sure what 147c8cc0d31c11e5aacb3b02dd594b59
represents.
I am trying to automate this process and if I don't know about this random hash value, it won't be possible to know which directory to pick. Is there any way to disable this or decipher this from the output of nodetool command ?
By default, incremental backup is disabled in Cassandra. This can be enabled by changing the value of “incremental_backups” to “true” in the cassandra. yaml file. Once enabled, Cassandra creates a hard link to each memtable flushed to SSTable to a backup's directory under the keyspace data directory.
From the Documentation.
Snapshots are taken per node using the nodetool snapshot command. To take a global snapshot, run the nodetool snapshot command using a parallel ssh utility, such as pssh.
A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. You must have enough free disk space on the node to accommodate making snapshots of your data files. A single snapshot requires little disk space. However, snapshots can cause your disk usage to grow more quickly over time because a snapshot prevents old obsolete data files from being deleted. After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place.
Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace.
$ nodetool -h localhost -p 7199 snapshot mykeyspace
The snapshot is created in data_directory_location/keyspace_name/table_name/snapshots/snapshot_name
directory. Each snapshot directory contains numerous .db files that contain the data at the time of the snapshot.
Cassandra flushes the node before taking a snapshot, takes the snapshot, and stores the data in the snapshots directory of each keyspace in the data directory. If you do not specify the name of a snapshot directory using the -t option, Cassandra names the directory using the timestamp of the snapshot, for example 1391460334889. Follow the procedure for taking a snapshot before upgrading Cassandra. When upgrading, backup all keyspaces. For more information about snapshots, see Apache documentation .
If you did not specify a snapshot name, Cassandra names snapshot directories using the timestamp
of the snapshot. If the keyspace contains no data, empty directories are not created.
Example: Single table snapshot
Take a snapshot of only the playlists table in the music keyspace. On Linux, in the Cassandra bin directory, for example:
$ ./nodetool snapshot -cf playlists music
Requested creating snapshot(s) for [music] with snapshot name [1391461910600]
Snapshot directory: 1391461910600
Cassandra creates the snapshot directory named 1391461910600 that contains the backup data of playlists table in
/var/lib/cassandra/data/music/playlists-bf8118508cfd11e3972273ded3cb6170/snapshots
nodetool <options> snapshot (
( -cf <table> | --column-family <table> )
( -t <tag> | --tag <tag> )
-- ( <keyspace> ) | ( <keyspace> ... )
)
options are:
( -h
| --host
) |
-p
| --port
) -pw
| --password
) ( -u
| --username
)
-cf
, or --column-family
, followed by the name of the table to be backed up.
-t
or --tag
, followed by the snapshot name.
-- Separates an option and argument that could be mistaken for a option.
keyspace is one keyspace name that is required when using the -cf
option, or one or more optional keyspace names, separated by a space.
UPDATE::
/var/lib/cassandra/data/music/playlists-bf8118508cfd11e3972273ded3cb6170/snapshots
Here in playlists-bf8118508cfd11e3972273ded3cb6170
, -bf8118508cfd11e3972273ded3cb6170
is UUID
So it is generating in that way.Ad There are some options to monitor the sstables that are being written, and incrementally backup those files.
Check out tablesnap, & cassandra snapshotter .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With