Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

logstash output to elasticsearch with document_id; what to do when I don't have a document_id?

I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:

output {
        elasticsearch_http {
            host => "127.0.0.1"
            document_id => "%{document_id}"
        }
}

I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.

output {
        elasticsearch_http {
            host => "127.0.0.1"
            if document_id {
                document_id => "%{document_id}"
            } 
        }
}

Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
        elasticsearch_http {
    host => "127.0.0.1"
    if 

I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:

if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {
like image 981
tedder42 Avatar asked May 13 '15 23:05

tedder42


People also ask

How do I transfer data from Logstash to Elasticsearch?

To use this configuration, we must also set up Logstash to receive events from Beats. In this setup, the Beat sends events to Logstash. Logstash receives these events by using the Beats input plugin for Logstash and then sends the transaction to Elasticsearch by using the Elasticsearch output plugin for Logstash.

Does Logstash create index in Elasticsearch?

Logstash does not create index on elasticsearch.

What is Ilm_enabled?

The use of Index Lifecycle Management is controlled by the ilm_enabled setting. By default, this setting detects whether the Elasticsearch instance supports ILM, and uses it if it is available. ilm_enabled can also be set to true or false to override the automatic detection, or disable ILM.


2 Answers

You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:

output {
  if [document_id] {
    elasticsearch_http {
      host => "127.0.0.1"
      document_id => "%{document_id}"
    } 
  } else {
    elasticsearch_http {
      host => "127.0.0.1"
    } 
  }
}

(But the suggestion in one of the other answers to use the uuid filter is good too.)

like image 136
Magnus Bäck Avatar answered Sep 27 '22 17:09

Magnus Bäck


One way to solve this is to make sure a document_idis always available. You can achieve this by adding a UUID filter in the filter section that would create the document_id field if it is not present.

filter {
    if "" in [document_id] {
        uuid {
            target => "document_id"
        }
    }
}

Edited per Magnus Bäck's suggestion. Thanks!

like image 31
Val Avatar answered Sep 27 '22 16:09

Val