Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logstash: Handling of large messages

I'm trying to parse a large message with Logstash using a file input, a json filter, and an elasticsearch output. 99% of the time this works fine, but when one of my log messages is too large, I get JSON parse errors, as the initial message is broken up into two partial invalid JSON streams. The size of such messages is about 40,000+ characters long. I've looked to see if there is any information on the size of the buffer, or some max length that I should try to stay under, but haven't had any luck. The only answers I found related to the udp input, and being able to change the buffer size.

Does Logstash has a limit size for each event-message? https://github.com/elastic/logstash/issues/1505

This could also be similar to this question, but there were never any replies or suggestions: Logstash Json filter behaving unexpectedly for large nested JSONs

As a workaround, I wanted to split my message up into multiple messages, but I'm unable to do this, as I need all the information to be in the same record in Elasticsearch. I don't believe there is a way to call the Update API from logstash. Additionally, most of the data is in an array, so while I can update an Elasticsearch record's array using a script (Elasticsearch upserting and appending to array), I can't do that from Logstash.

The data records look something like this:

{ "variable1":"value1", 
 ......, 
 "variable30": "value30", 
 "attachements": [ {5500 charcters of JSON}, 
                   {5500 charcters of JSON}, 
                   {5500 charcters of JSON}.. 
                   ...
                   {8th dictionary of JSON}]
 }

Does anyone know of a way to have Logstash process these large JSON messages, or a way that I can split them up and have them end up in the same Elasticsearch record (using Logstash)?

Any help is appreciated, and I'm happy to add any information needed!

like image 465
praddc Avatar asked Oct 31 '22 05:10

praddc


1 Answers

If your elasticsearch output has a document_id set, it will update the document (the default action in logstash is to index the data -- which will update the document if it already exists)

In your case, you'd need to include some unique field as part of your json messages and then rely on that to do the merge in elasticsearch. For example:

{"key":"123455","attachment1":"something big"}
{"key":"123455","attachment2":"something big"}
{"key":"123455","attachment3":"something big"}

And then have an elasticsearch output like:

elasticsearch { 
  host => localhost
  document_id => "%{key}" 
}
like image 161
Alcanzar Avatar answered Nov 15 '22 09:11

Alcanzar