Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logstash: XML to JSON output from array to string

I am in the process of trying to use Logstash to convert an XML into JSON for ElasticSearch. I am able to get the the values read and sent to ElasticSearch. The issue is that all the values come out as arrays. I would like to make them come out as just strings. I know I can do a replace for each field individually, but then I run into an issue with nested fields being 3 levels deep.

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
    <acs2:locationId>Location Id</acs2:locationId>
    <acs2:userId>User Id</acs2:userId>
    <acs2:TestResult>
        <acs1:CreatedBy>My Name</acs1:CreatedBy>
        <acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
        <acs1:Output>10.5</acs1:Output>
    </acs2:TestResult>
</acs2:SubmitTestResult>

Logstash Config

input {
    file {
        path => "/var/log/logstash/test.xml"
    }
}
filter {
    multiline {
        pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
        what => "previous"
    }
    if "multiline" in [tags] {
        mutate {
            replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
        }
        xml {
            target => "SubmitTestResult"
            source => "message"
        }
        mutate {
            remove_field => ["message", "@version", "host", "@timestamp", "path", "tags", "type"]
            remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]

            # This works
            replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]

            # This does NOT work
            replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
        }
    }
}
output {
    stdout {
        codec => "rubydebug"
    }
    elasticsearch {
        index => "xmltest"
        cluster => "logstash"
    }
}

Example Output

{
   "_index": "xmltest",
   "_type": "logs",
   "_id": "AU8IZBURkkRvuur_3YDA",
   "_version": 1,
   "found": true,
   "_source": {
      "SubmitTestResult": {
         "locationId": "Location Id",
         "userId": [
            "User Id"
         ],
         "TestResult": [
            {
               "CreatedBy": [
                  "My Name"
               ],
               "CreatedDate": [
                  "2015-08-07"
               ],
               "Output": [
                  "10.5"
               ]
            }
         ]
      }
    }
}

As you can see, the output is an array for each element (except for the locationId I replaced with). I am trying to not have to do the replace for each element. Is there a way to adjust the config to make the output come put properly? If not, how do I get 3 levels deep in the replace?

--UPDATE--

I figured out how to get to the 3rd level in Test Results. The replace is:

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
like image 802
Ascalonian Avatar asked Nov 10 '22 07:11

Ascalonian


1 Answers

I figured it out. Here is the solution.

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
like image 71
Ascalonian Avatar answered Nov 15 '22 05:11

Ascalonian