Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch monthly rolling indices

I've been using logstash to feed daily rolling indices in elasticsearch, something like this

   output {

        elasticsearch {

                ....
                index: myindex-%{+YYYY.MM.DD}
       }
   }

Now it turns out I need to use monthly rolling indices instead, after having a look at http://logstash.net/docs/1.4.1/outputs/elasticsearch.html#index

But I still feel confused, so is the answer as simple as to use myindex-%{+YYYY.MM} instead, and the index will be rolling at the end of each month?

Update: Here are examples of the "same" event (that has same _id field) gets indexed on two different days

on day A, this is indexed

   {_id: 123, message: "old message}

on following day B, this is indexed

   {_id: 123, message: "updated message} 

so if day A and day B belongs to two separate indices, I will have 2 events if my query looking back to all of these indices. To eliminate duplication, on indexing event B, I will do a additional check querying with _id and remove the previously existing event A then do index to B. if it's daily indices, as time going, I'm afraid my query on _id search will be more expensive, which monthly can improve. Last but not least, if the event found from my check exists in current index (today's to this month's), it won't remove the event but let elasticsearch to do the update based on _id (essentially it's a delete/create as well, just I don't need to do it in my code)

Thanks

like image 347
James Jiang Avatar asked Jun 24 '15 01:06

James Jiang


People also ask

How do I create a monthly index in Elasticsearch?

Indexing documents to elasticsearch monthly? The only way to create a new index is to request one, there is no way for ES to do that itself. You can rely on automatic index creation like logstash does, meaning that the first document that you index into a new index, will cause the index creation as well. The action.

What is Elasticsearch rollover?

When a rollover is triggered, a new index is created, the write alias is updated to point to the new index, and all subsequent updates are written to the new index. Rolling over to a new index based on size, document count, or age is preferable to time-based rollovers.

What is ILM policy in Elasticsearch?

You can configure index lifecycle management (ILM) policies to automatically manage indices according to your performance, resiliency, and retention requirements. For example, you could use ILM to: Spin up a new index when an index reaches a certain size or number of documents.


1 Answers

What happens with the config you provide is that the timestamp of the event is taken. If you do no additional config the time the event was received by log stash is taken. However, it is often useful to take the timestamp that is within the event itself. Than this time stamp is used. Below some sample code that I often use.

filter {
  date {
      match => ["timestamp" , "dd/MMM/yyyy:HH:mm:ss Z"]
  }
}
output {
  elasticsearch {
    protocal => "transport"
    host => "localhost:9300"
    cluster => "mycluster"
    index => "gridshore-logs-%{+YYYY.MM}"
  }
}
like image 191
Jettro Coenradie Avatar answered Sep 20 '22 03:09

Jettro Coenradie