I have thousands of logs files and it gets downloaded everyday. I am using logstash and ElasticSearch for parsing, indexing and searching.
Now I am using file input plugin for reading downloaded files and parsing it. I have not set sincedb_path so its storing in $HOME. But the problem is it reads log files for just one day. Here is my configuration for input: 
input {
  file {
    path => "/logs/downloads/apacheLogs/env1/**/*"
    type => "env1"
    exclude => "*.gz"
    start_position => "beginning"
  }
  file {
    path => "/logs/downloads/appLogs/env2/**/*"
    type => "env2"
    exclude => "*.gz"
    start_position => "beginning"
  }
}  
sincedb_path just needs to be a directory where logstash has write permission for the registry. sincedb_write_interval defines how often logstash should write the sincedb registry. A larger value puts you at risk in logstash were to crash.
Logstash searches for the specified GROK patterns in the input logs and extracts the matching lines from the logs. You can use GROK debugger to test your GROK patterns. Here, PATTERN represents the GROK pattern and the fieldname is the name of the field, which represents the parsed data in the output.
Logstash receives these events by using the Beats input plugin for Logstash and then sends the transaction to Elasticsearch by using the Elasticsearch output plugin for Logstash. The Elasticsearch output plugin uses the bulk API, making indexing very efficient.
This is apparently caused by a bug in the File handler.
When File{} input method reads a log file, the last byte processed is being saved and periodically copied out to the sincedb file. While you can set the file to be /dev/null if you want, Logstash reads the file only during start up and uses the information from table in memory after.
The problem is that the table in memory indexes position by inode, and is never pruned, even if it detects that a given file no longer exists. If you delete a file and then add a new one -- even if it has a different name -- it may well have the same inode number, and the File handler will think it is the same file.
If the new file is larger, then the handler will only read from the previous max byte onwards and update the table. If the new file is smaller, then it seems to think the file was somehow truncated, and may start processing again from the default position.
As a result, the only way to handle things is to set sincedb to be /dev/null, and then restart logstash (causing the internal table to be lost) and then all the files matching the pattern will be re-read from the beginning - and this has problems as well, since some of the files may not be new.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With