Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete data from a specific index in elasticsearch after a certain period?

I have an index in elasticsearch with is occupied by some json files with respected to timestamp. I want to delete data from that index.

curl -XDELETE http://localhost:9200/index_name

Above code deletes the whole index. My requirement is to delete certain data after a time period(for example after 1 week). Could I automate the deletion process?

I tried to delete by using curator.

But I think it deletes the indexes created by timestamp, not data with in an index. Can we use curator for delete data within an index?

It will be pleasure if I get to know that either of following would work:

  • Can Curl Automate to delete data from an index after a period?
  • Can curator Automate to delete data from an index after a period?
  • Is there any other way like python scripting to do the job?

References are taken from the official site of elasticsearch.

Thanks a lot in advance.

like image 836
ADARSH K Avatar asked Mar 14 '19 09:03

ADARSH K


People also ask

How do I delete a particular data index in Elasticsearch?

You can delete either whole index,doc-type or a perticular id data. these are the three ways: curl -XDELETE localhost:9200/index_name. curl -XDELETE localhost:9200/index_name/doc-type.

Does deleting index in Elasticsearch delete data?

Yes, deleting the index, deletes all the data in that index.

How do I truncate Elasticsearch index?

You would need to delete the index and then recreate it. While this will require you to setup your mapping again. There are other options such as deleting by query, but this will mark records as deleted in the lucene index and, while merged out over time, will not free up space.

What is flush index in Elasticsearch?

Flushing a data stream or index is the process of making sure that any data that is currently only stored in the transaction log is also permanently stored in the Lucene index.

How do I delete data from an Elasticsearch index?

Elasticsearch indices can quickly fill up with gigabytes of data, especially if you’re logging from multiple servers many times a second. To manage data, Elasticsearch Elasticsearch offers a “Delete By Query” API, that will remove all documents matching a query.

Can I use multiple indexes in Elasticsearch?

In Elasticsearch, you don’t usually use indexes directly. Your dashboards use index patterns, which can match multiple indexes at once. The reason for this is that the indexes themselves can act as groups of data, such as grouping by day or month.

How do I remove all timestamps from a query in Elasticsearch?

To manage data, Elasticsearch Elasticsearch offers a “Delete By Query” API, that will remove all documents matching a query. You can use this to match timestamps greater or less than a certain date, albeit a bit crudely:

What to do when data become redundant in Elasticsearch?

Luckily, with Elasticsearch, when data become redundant, all you need to do is access a tool to perform requests and transfer data over the network. This quick guide will show you how to use the mighty Elasticsearch API to delete documents and indices.


2 Answers

You can use the DELETE BY QUERY API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Basically it will delete all the documents matching the provided query:

POST twitter/_delete_by_query
{
  "query": { 
    "match": {
      "message": "some message"
    }
  }
}

But the suggested way is to implement indexes for different periods (days for example) and use curator to drop them periodically, based on the age:

...
logs_2019.03.11
logs_2019.03.12
logs_2019.03.13
logs_2019.03.14
like image 182
Enrichman Avatar answered Nov 15 '22 06:11

Enrichman


Simple example using Delete By Query API:

POST index_name/_delete_by_query
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "lte": "2019-06-01 00:00:00.0",
            "format": "yyyy-MM-dd HH:mm:ss.S"
          }
        }
      }
    }
  }
}

This will delete records which have a field "timestamp" which is the date/time (within the record) at which they occured. One can run the query to get a count for what will be deleted.

GET index_name/_search
{
  "size": 1,
  "query: {
-- as above --

Also it is nice to use offset dates

         "lte": "now-30d",

which would delete all records older than 30 days.

like image 25
georgep68 Avatar answered Nov 15 '22 05:11

georgep68