Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

logstash + elasticsearch : reloads the same data

Managed to get logstash (1.3.1) to send data to elasticsearch (0.9.5).

My logstash conf file setup is

input {
  file {
    path => ["D:/apache-tomcat-7.0.5/logs/*.*"]
   }

}
 output {
  stdout { } 
    elasticsearch_http {
    host => "localhost"
    port => 9200
   }
 }

The data is stored in ES under index logstash-2013.12.xx

However, if i restart logstash, lets say next day - the same data is reloaded into a new index. Even if i restart again, the document count doubles in the index.

Seems like logstash re-reading the data and ES is also duplicating the documents.

Is there a way to not reload in logstash or not duplicate in ES or do BOTH.

like image 481
Samant Avatar asked Jan 12 '23 16:01

Samant


1 Answers

I ran across this issue with Logstash 1.3.3 as well. The relevant bug report on Logstash Jira is LOGSTASH-429 File Input - .sincedb file is broken on Windows. There has also been a patch created by Boyd Meier.

This patch has also been pulled into Jordan Sissel's ruby-filewatch git repository for inclusion in a later version however it just hasn't made it in yet.

The issue comes from Logstash using the file inode which always returns 0 on Windows. Boyd Meier uses the File ID to get an identifier for the file to bypass the issue. This file id remains the same until the file is deleted from the volume.

If you're comfortable doing a bit of patching you can patch the change in from Jordan Sissel's ruby-filewatch git repository. For 1.3.3 that I have only just patched and am in the process of testing against test log files the steps were:

  1. Download ruby-filewatch zip file from Github: Jordan Sissel's ruby-filewatch git repository
  2. Unzip the zip file you downloaded to a new directory
  3. I had to make a change to the Ruby-filewatch\lib\filwatch\tail.rb file -> Line 10 which reads require "JRubyFileExtension.jar". I had to change to require "java/JRubyFileExtension.jar" as otherwise I was getting an error that it wasn't able to find the jar file when trying to read a file. For reference that makes the whole line appear as: require "java/JRubyFileExtension.jar" if defined? JRUBY_VERSION
  4. Open logstash-1.3.3-flatjar.jar file in 7-Zip
  5. Drag and drop the java directory from ruby-filewatch into the root folder in 7-Zip
  6. Drag and drop all the files from the ruby-filewatch\lib\filewatch folder into the filewatch folder in 7-Zip, overwriting any existing files

Now when you run it against multiple log files you should find that sincedb contains more than one entry and the entries appear similar to 1717916447-2604966-851968 0 2 428312038. If you're having trouble finding the sincedb file and haven't set sincedb_path in your config file it can be found in the home directory of the user running the jar. If this is your user you can get to it easily using Windows key + Run -> %USERPROFILE% -> OK.

As always take care when patching and test thoroughly before deploying to production systems.

like image 64
Garth McCormack Avatar answered Jan 17 '23 13:01

Garth McCormack