Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I configure elasticsearch to retain documents up to 30 days?

Is there a default data retention period in elasticsearch? If yes can you help me find the configuration?

like image 702
Nick Avatar asked Jul 21 '14 21:07

Nick


People also ask

How many documents can Elasticsearch hold?

You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index. Data in documents is defined with fields comprised of keys and values.

Where does Elasticsearch store its data?

According to the documentation the data is stored in a folder called "data" in the elastic search root directory. Save this answer. Show activity on this post. The config and logs directories are siblings of data .

What does indexing document mean in Elasticsearch?

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure.


4 Answers

This is no longer supported in Elasticsearch 5.0.0 or greater. The best practice is to create indexes periodically (daily is most common) and then delete the index when the data gets old enough.

Here's a reference to how to delete an index (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html)

This article (though it's old enough to reference _ttl) also gives some insight: https://www.elastic.co/blog/using-elasticsearch-and-logstash-to-serve-billions-of-searchable-events-for-customers

As a reminder, it's best to protect your Elasticsearch cluster from the outside world via a proxy and restrict the methods that can be sent to your cluster. This way you can prevent your cluster from being ransomed.

like image 163
Michael Kemmerer Avatar answered Oct 27 '22 23:10

Michael Kemmerer


Yeah you can set a TTL on the data. Take a look here for the configuration options available.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html

like image 41
goalie7960 Avatar answered Oct 27 '22 21:10

goalie7960


Elasticsearch curator is the tool to use if you want to manage your indexes: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/index.html

Here's an example of how to delete indices based on age: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/ex_delete_indices.html

Combine with cron to have this done at regular intervals.

like image 45
Michael Christensen Avatar answered Oct 27 '22 23:10

Michael Christensen


There is no default retention period but new versions of Elasticsearch have index lifecycle management (ILM). It allows to:

Delete stale indices to enforce data retention standards

Documentation.

Simple policy example:

PUT _ilm/policy/my_policy
{
    "policy": {
        "phases": {
            "delete": {
                "min_age": "30d",
                "actions": {
                    "delete": {}
                }
            }
        }
    }
}

If you use OpenSearch in AWS then take a look at this documentation for the same thing.

Pretty old question but I got the same question just now. Maybe it will be helpful for somebody else.

like image 44
Timur Levadny Avatar answered Oct 27 '22 22:10

Timur Levadny