Most of the ElasticSearch documentation discusses working with the indexes through the REST API - is there any reason I can't simply move or delete index folders from the disk?
Yes, deleting the index, deletes all the data in that index.
Taking a snapshot is the only reliable and supported way to back up a cluster. You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup.
Indexes are stored on disk as configured in elasticsearch. yml with the configuration option path. data ; localhost on port 9200 is the default connection port for the HTTP REST interface, the path of the url generally defines an action to be taken (like searching for documents);
You can move data around on disk, to a point -
If Elasticsearch is running, it is never a good idea to move or delete the index
folders, because Elasticsearch will not know what happened to the data, and you
will get all kinds of FileNotFoundExceptions
in the logs as well as indices
that are red until you manually delete them.
If Elasticsearch is not running, you can move index folders to another node (for instance, if you were decomissioning a node permanently and needed to get the data off), however, if the delete or move the folder to a place where Elasticsearch cannot see it when the service is restarted, then Elasticsearch will be unhappy. This is because Elasticsearch writes what is known as the cluster state to disk, and in this cluster state the indices are recorded, so if ES starts up and expects to find index "foo", but you have deleted the "foo" index directory, the index will stay in a red state until it is deleted through the REST API.
Because of this, I would recommend that if you want to move or delete individual index folders from disk, that you use the REST API whenever possible, as it's possible to get ES into an unhappy state if you delete a folder that it expects to find an index in.
EDIT: I should mention that it's safe to copy (for backups) an indices folder, from the perspective of Elasticsearch, because it doesn't modify the contents of the folder. Sometimes people do this to perform backups outside of the snapshot & restore API.
I use this procedure: I close, backup, then delete the indexes.
curl -XPOST "http://127.0.0.1:9200/*index_name*/_close"
After this point all index data is on disk and in a consistent state, and no writes are possible. I copy the directory where the index is stored and then delete it:
curl -XPOST "http://127.0.0.1:9200/*index_name*/_delete"
By closing the index, elasticsearch stop all access on the index. Then I send a command to delete the index (and all corresponding files on disk).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With