how to use sincedb in logstash?

Tags:

logstash

I have thousands of logs files and it gets downloaded everyday. I am using logstash and ElasticSearch for parsing, indexing and searching.

Now I am using file input plugin for reading downloaded files and parsing it. I have not set sincedb_path so its storing in $HOME. But the problem is it reads log files for just one day. Here is my configuration for input:

input {
  file {
    path => "/logs/downloads/apacheLogs/env1/**/*"
    type => "env1"
    exclude => "*.gz"
    start_position => "beginning"
  }
  file {
    path => "/logs/downloads/appLogs/env2/**/*"
    type => "env2"
    exclude => "*.gz"
    start_position => "beginning"
  }
}

228

asked Jan 20 '14 09:01

Ananda

1 Answers

This is apparently caused by a bug in the File handler.

When File{} input method reads a log file, the last byte processed is being saved and periodically copied out to the sincedb file. While you can set the file to be /dev/null if you want, Logstash reads the file only during start up and uses the information from table in memory after.

The problem is that the table in memory indexes position by inode, and is never pruned, even if it detects that a given file no longer exists. If you delete a file and then add a new one -- even if it has a different name -- it may well have the same inode number, and the File handler will think it is the same file.

If the new file is larger, then the handler will only read from the previous max byte onwards and update the table. If the new file is smaller, then it seems to think the file was somehow truncated, and may start processing again from the default position.

As a result, the only way to handle things is to set sincedb to be /dev/null, and then restart logstash (causing the internal table to be lost) and then all the files matching the pattern will be re-read from the beginning - and this has problems as well, since some of the files may not be new.

200

answered Sep 30 '22 19:09

Steve Shipway

Related questions
                            
                                Elasticsearch: How to store term vectors
                            
                                Script-based sorting on Elasticsearch date field
                            
                                elasticsearch filter by length of a string field
                            
                                Elastic bulk error: failed to parse
                            
                                Scope Elasticsearch Results to Specific Ids
                            
                                No query registered for []
                            
                                Port issues with Vagrant and Elasticsearch
                            
                                Kibana, filter on count greater than or equal to X
                            
                                How to configure Spring Boot with elasticsearch 5.2.1?
                            
                                Elasticsearch Rest Client Still Giving IOException : Too Many Open Files
                            
                                How To Push a Spark Dataframe to Elastic Search (Pyspark)
                            
                                PostgreSQL + Elasticsearch synchronization in JAVA spring (JPA)
                            
                                version_conflict_engine_exception with multiple _update_by_query
                            
                                Content-type header not supported
                            
                                How to get around "connection reset by peer" when using Elasticsearch's RestClient
                            
                                ElasticSearch Delete Query - Filter with term and range
                            
                                ElasticSearch Bool Filter with a Phrase (instead of a single word/tag)
                            
                                elasticsearch client thread safety
                            
                                NEST (elasticsearch) Highlighting in multiple fields
                            
                                How to define a mapping in elasticsearch that doesn't accept fields other that the mapped ones?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With