Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selective parsing of csv file using logstash

I am trying to feed data into elasticsearch from csv files, through logstash. These csv files contain the first row as the column names. Is there any particular way to skip that row while parsing the file? Are there any conditionals/filters that I could use such that in case of exception it would skip to the next row??

my config file looks like:

input {  
      file {
          path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
          type => "promosms_dec15"
          start_position => "beginning"
          sincedb_path => "/dev/null"
      }
}
filter {

    csv {
        columns => ["Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"]
        separator => ","
    }  
    ruby {
          code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
    }

}
output {  
    elasticsearch { 
        action => "index"
        host => "localhost"
        index => "promosms-%{+dd.MM.YYYY}"
        workers => 1
    }
}

The first few rows of my csv file looks like

"Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"
"","No","FMN1191MVHV","31/03/2014"
"","No","FMN1191N64G","31/03/2014"
"","No","FMN1192OPMY","31/03/2014"

Is there anyway I could skip the first line? Also, if my csv file ends with a new line, with nothing in it, then also I get an error. How do I skip those new lines if they come at the end of the file or if thre is an empty row between 2 rows?

like image 903
Sagnik Sinha Avatar asked Dec 17 '14 07:12

Sagnik Sinha


1 Answers

A simple way to do it would be to add the following to your filter (after csv, before ruby):

if [Comm_Plan] == "Comm_Plan" {
  drop { }
}

Assuming the field would never normally have the same value as the column heading, it should work as expected, however, you could be more specific by using:

if [Comm_Plan] == "Comm_Plan" and [Queue_Booking] == "Queue_Booking" and [Order_Reference] == "Order_Reference" and [Generation_Date] == "Generation_Date" {
  drop { }
}

All this would do would be to check to see if the field value had that particular value and if it did, drop the event.

like image 52
Rumbles Avatar answered Sep 23 '22 01:09

Rumbles