Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse json in a list in logstash

I have a json in the form of

[
    {
        "foo":"bar"
    }
]

I am trying to filter it using the json filter in logstash. But it doesn't seem to work. I found that I can't parse list json using the json filter in logstash. Can someone please tell me about any workaround for this?

UPDATE

My logs

IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"

URL decoded version of the above log is

IP - - 0.000 0.000 [24/May/2015:06:51:13  0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"

Please find below my config file for the above logs..

filter {

urldecode{
    field => "message"
}
 grok {
  match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}

kv {
    field_split => "&? "
}
json{
    source=> "events"
}
geoip {
    source => "clientip"
}

}

I need to parse the events, ie events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]

like image 608
Keshav Agarwal Avatar asked Aug 03 '15 07:08

Keshav Agarwal


People also ask

Does Elasticsearch use JSON?

Elasticsearch offers simple REST based APIs, a simple HTTP interface, and uses schema-free JSON documents, making it easy to get started and quickly build applications for a variety of use-cases.


1 Answers

I assume that you have your json in a file. You are right, you cannot use the json filter directly. You'll have to use the multiline codec and use the json filter afterwards.

The following config works for your given input. However, you might have to change it in order to properly separate your events. It depends on your needs and the json format of your file.

Logstash config:

input     {   
    file     {
        codec => multiline
        {
            pattern => "^\]" # Change to separate events
            negate => true
            what => previous               
        }
        path => ["/absolute/path/to/your/json/file"]
        start_position => "beginning"
        sincedb_path => "/dev/null" # This is just for testing
    }
}

filter     {
    mutate   {
            gsub => [ "message","\[",""]
            gsub => [ "message","\n",""]
        }
    json { source => message }
}

UPDATE

After your update I guess I've found the problem. Apparently you get a jsonparsefailure because of the square brackets. As a workaround you could manually remove them. Add the following mutate filter after your kv and before your json filter:

mutate  {
    gsub => [ "events","\]",""]
    gsub => [ "events","\[",""]
}

UPDATE 2

Alright, assuming your input looks like this:

[{"foo":"bar"},{"foo":"bar1"}]

Here are 4 options:

Option a) ugly gsub

An ugly workaround would be another gsub:

gsub => [ "event","\},\{",","]

But this would remove the inner relations so I guess you don't want to do that.

Option b) split

A better approach might be to use the split filter:

split {
    field => "event"
    terminator => ","
}
mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
   }
json{
    source=> "event"
}

This would generate multiple events. (First with foo = bar and second with foo1 = bar1.)

Option c) mutate split

You might want to have all the values in one logstash event. You could use the mutate => split filter to generate an array and parse the json if an entry exists. Unfortunately you will have to set a conditional for each entry because logstash doesn't support loops in its config.

mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
    split => [ "event", "," ]
   }

json{
    source=> "event[0]"
    target => "result[0]"
}

if 'event[1]' {
    json{
        source=> "event[1]"
        target => "result[1]"
    }
    if 'event[2]' {
        json{
            source=> "event[2]"
            target => "result[2]"
        }
    }
    # You would have to specify more conditionals if you expect even more dictionaries
}

Option d) Ruby

According to your comment I tried to find a ruby way. Following works (after your kv filter):

mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
}

ruby  {
    init => "require 'json'"
    code => "
        e = event['event'].split(',')
        ary = Array.new
        e.each do |x|
            hash = JSON.parse(x)
            hash.each do |key, value|
                ary.push( { key =>  value } )
            end
        end
        event['result'] = ary
    "
}

Option e) Ruby

Use this approach after your kv filter (without setting a mutate filter):

ruby  {
    init => "require 'json'"
    code => "
            event['result'] = JSON.parse(event['event'])
    "
}

It will parse events like event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]

into:

"result" => [
    [0] {
           "name" => "Alex",
        "address" => "NewYork"
    },
    [1] {
           "name" => "David",
        "address" => "NewJersey"
    }

Since the behavior of the kv filter this does not support whitespaces. I hope you don't have any in your real inputs, do you?

like image 100
hurb Avatar answered Sep 24 '22 07:09

hurb