I have some logstash input where I use the document_id
to remove duplicates. However, most input doesn't have a document_id
. The following plumbs the actual document_id
through, but if it doesn't exist, it gets accepted as literally %{document_id}
, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:
output {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
}
I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.
output {
elasticsearch_http {
host => "127.0.0.1"
if document_id {
document_id => "%{document_id}"
}
}
}
Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
elasticsearch_http {
host => "127.0.0.1"
if
I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:
if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {
To use this configuration, we must also set up Logstash to receive events from Beats. In this setup, the Beat sends events to Logstash. Logstash receives these events by using the Beats input plugin for Logstash and then sends the transaction to Elasticsearch by using the Elasticsearch output plugin for Logstash.
Logstash does not create index on elasticsearch.
The use of Index Lifecycle Management is controlled by the ilm_enabled setting. By default, this setting detects whether the Elasticsearch instance supports ILM, and uses it if it is available. ilm_enabled can also be set to true or false to override the automatic detection, or disable ILM.
You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:
output {
if [document_id] {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
} else {
elasticsearch_http {
host => "127.0.0.1"
}
}
}
(But the suggestion in one of the other answers to use the uuid filter is good too.)
One way to solve this is to make sure a document_id
is always available. You can achieve this by adding a UUID filter in the filter section that would create the document_id
field if it is not present.
filter {
if "" in [document_id] {
uuid {
target => "document_id"
}
}
}
Edited per Magnus Bäck's suggestion. Thanks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With