I am working in a setup on which Logstash will look into a local specific directory for JSON files to parse and forward to Elasticsearch. Those files will be generated daily and placed on the directory where Logstash is monitoring, so there will be a new uniquely named JSON file every day.
My input is as such:
input {
    file {
        path => "/home/path_to_json/*.json"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
My question is, how do I configure Logstash to only ingest the latest/newest file, and not everthing else on the directory everytime a new file gets dumped, so that it will not duplicate data on Elasticsearch? Is this the default behavior of the File plugin? Or should I set up anything new on my input?
Thanks in advance!
Setting sincedb_path to /dev/null will ensure that whenever you restart it will not remember anything and start reading everything from the beginning again.  You probably want to remove this line.
Setting start_position to end will make it only consider lines/files added after logstash was started (the first time).
With these two changes you should only get new data ingested.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With