I'm trying to parse a large message with Logstash using a file input, a json filter, and an elasticsearch output. 99% of the time this works fine, but when one of my log messages is too large, I get JSON parse errors, as the initial message is broken up into two partial invalid JSON streams. The size of such messages is about 40,000+ characters long. I've looked to see if there is any information on the size of the buffer, or some max length that I should try to stay under, but haven't had any luck. The only answers I found related to the udp input, and being able to change the buffer size.
Does Logstash has a limit size for each event-message? https://github.com/elastic/logstash/issues/1505
This could also be similar to this question, but there were never any replies or suggestions: Logstash Json filter behaving unexpectedly for large nested JSONs
As a workaround, I wanted to split my message up into multiple messages, but I'm unable to do this, as I need all the information to be in the same record in Elasticsearch. I don't believe there is a way to call the Update API from logstash. Additionally, most of the data is in an array, so while I can update an Elasticsearch record's array using a script (Elasticsearch upserting and appending to array), I can't do that from Logstash.
The data records look something like this:
{ "variable1":"value1",
......,
"variable30": "value30",
"attachements": [ {5500 charcters of JSON},
{5500 charcters of JSON},
{5500 charcters of JSON}..
...
{8th dictionary of JSON}]
}
Does anyone know of a way to have Logstash process these large JSON messages, or a way that I can split them up and have them end up in the same Elasticsearch record (using Logstash)?
Any help is appreciated, and I'm happy to add any information needed!
If your elasticsearch
output has a document_id
set, it will update the document (the default action in logstash is to index
the data -- which will update the document if it already exists)
In your case, you'd need to include some unique field as part of your json messages and then rely on that to do the merge in elasticsearch. For example:
{"key":"123455","attachment1":"something big"}
{"key":"123455","attachment2":"something big"}
{"key":"123455","attachment3":"something big"}
And then have an elasticsearch
output like:
elasticsearch {
host => localhost
document_id => "%{key}"
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With