Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logstash grok filter doesn't work for the last field

With Logstash 2.3.3, grok filter doesn't work for the last field.

To reproduce the problem, create test.conf as follows:

input {
  file {
    path => "/Users/izeye/Applications/logstash-2.3.3/test.log"
  }
}

filter {
  grok {
    match => { "message" => "%{DATA:id1},%{DATA:id2},%{DATA:id3},%{DATA:id4},%{DATA:id5}" }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Run ./bin/logstash -f test.conf

and after it started, in another terminal run echo "1,2,3,4,5" >> test.log

and I got the following output:

Johnnyui-MacBook-Pro:logstash-2.3.3 izeye$ ./bin/logstash -f test.conf 
Settings: Default pipeline workers: 8
Pipeline main started
{
       "message" => "1,2,3,4,5",
      "@version" => "1",
    "@timestamp" => "2016-07-07T07:57:42.830Z",
          "path" => "/Users/izeye/Applications/logstash-2.3.3/test.log",
          "host" => "Johnnyui-MacBook-Pro.local",
           "id1" => "1",
           "id2" => "2",
           "id3" => "3",
           "id4" => "4"
}

You can see the missing id5.

I'm not sure this is a bug or mis-configured.

Any hint will be appreciated.

like image 521
Johnny Lim Avatar asked Jul 07 '16 08:07

Johnny Lim


1 Answers

I think it is because how the DATA pattern is defined. Its regex is .*?, so it's a lazy match. It's not a bug, it's how regex works (example).
But you might want to ask a regex question in order to have an accurate answer.

As a solution, you can replace the last DATA with NUMBER (or something appropriate to your situation). GREEDYDATA would also work.


Though, in that solution, the csv or dissect filters might be better fit, as easier to configure and more performant.

like image 122
baudsp Avatar answered Sep 18 '22 16:09

baudsp