I have a fixed position (column) file, where there is no delimiter which separates the fields. Each field has its own start position and length. Here is the example of the data: <pre class="prettyprint"><code>520140914191193386---------7661705508623855646---1595852965---133437--the lazy fox jumping over-----------------------212.75.12.85--- </code></pre> While I used dashes (-) to show the sample of the data above, the actual file contains spaces if the actual field is shorter than allowed in schema. The schema in this case is: <pre class="prettyprint"><code>UsedID (start position 1, length 27) SystemID (start position 28, length 22) SampleID (start position 50, length 13) LineID (start position 63, length 8) Text (start position 71, length 48) IP (start position119, length 15) </code></pre> Ideally, I would get the following field values in logstash (without trailing spaces) <pre class="prettyprint"><code>UsedID:520140914191193386 SystemID:7661705508623855646 SampleID:1595852965 LineID:133437 Text:the lazy fox jumping over IP:212.75.12.85 </code></pre> How do I parse this kind of file with grok?

I'd go for a two-step process: <ul> <li>Split data into fields</li> <li>Strip empty data from end of each field</li> </ul> Since each field has a known length, you can use a regex pattern like <code>.{27}</code> to match them. In grok, you can name a field like so: <code>(?<user_id>.{27})</code> You can test a full pattern in the grok debugger, but something like this should achieve a length-based split: <pre class="prettyprint"><code>(?<user_id>.{27})(?<system_id>.{22})(?<sample_id>.{13})(?<line_id>.{8})(?<text>.{48})(?<ip>.{15}) </code></pre> You mentioned that your extra characters are all whitespace, so you can clean that up using the mutate filter with a strip option. All together, that might look something like this: <pre class="prettyprint"><code>filter { grok { match => ["message", "(?<user_id>.{27})(?<system_id>.{22})(?<sample_id>.{13})(?<line_id>.{8})(?<text>.{48})(?<ip>.{15})"] } mutate { strip => [ "user_id", "system_id", "sample_id", "line_id", "text", "ip" ] } } </code></pre>

Logstash grok filter help - fixed position file

Tags:

logstash

logstash-grok

I have a fixed position (column) file, where there is no delimiter which separates the fields. Each field has its own start position and length. Here is the example of the data:

520140914191193386---------7661705508623855646---1595852965---133437--the lazy fox jumping over-----------------------212.75.12.85---

While I used dashes (-) to show the sample of the data above, the actual file contains spaces if the actual field is shorter than allowed in schema.

The schema in this case is:

UsedID (start position 1, length 27)
SystemID (start position 28, length 22)
SampleID (start position 50, length 13)
LineID (start position 63, length 8)
Text (start position 71, length 48)
IP (start position119, length 15)

Ideally, I would get the following field values in logstash (without trailing spaces)

UsedID:520140914191193386
SystemID:7661705508623855646
SampleID:1595852965
LineID:133437
Text:the lazy fox jumping over
IP:212.75.12.85

How do I parse this kind of file with grok?

907

asked Sep 14 '14 20:09

DoiT International

1 Answers

I'd go for a two-step process:

Split data into fields
Strip empty data from end of each field

Since each field has a known length, you can use a regex pattern like .{27} to match them.

In grok, you can name a field like so: (?<user_id>.{27})

You can test a full pattern in the grok debugger, but something like this should achieve a length-based split:

(?<user_id>.{27})(?<system_id>.{22})(?<sample_id>.{13})(?<line_id>.{8})(?<text>.{48})(?<ip>.{15})

You mentioned that your extra characters are all whitespace, so you can clean that up using the mutate filter with a strip option.

All together, that might look something like this:

filter {
    grok {
        match => ["message", "(?<user_id>.{27})(?<system_id>.{22})(?<sample_id>.{13})(?<line_id>.{8})(?<text>.{48})(?<ip>.{15})"]
    }

    mutate {
        strip => [
            "user_id",
            "system_id",
            "sample_id",
            "line_id",
            "text",
            "ip"
        ]
    }
}

199

answered Sep 19 '22 22:09

rutter

Related questions
                            
                                Chain filters in logstash
                            
                                how to replace logstash @timestamp with log timestamp
                            
                                How do I wrap some of the methods of logstash logback encoder into an inner field?
                            
                                The tag "beats_input_codec_plain_applied" present in every document in Kibana
                            
                                Setting Elasticsearch Analyzer for new fields in logstash
                            
                                Elasticsearch converting a string to number
                            
                                Logstash grok square brackets
                            
                                logstash grok filter for logs with arbitrary attribute-value pairs
                            
                                Trying to use Logstash to index FROM Cloudwatch Logs
                            
                                Grok - parsing optional fields
                            
                                Not able to load Kibana on port 5601
                            
                                Kafka Connect Logstash [closed]
                            
                                Specify logstash configuration on command line with docker
                            
                                EVAL inside grok logstash
                            
                                can i set logstash default elasticsearch mapping through elasticsearch-template.json
                            
                                Logstash input jdbc is duplicating results
                            
                                Optimal way to set up ELK stack on three servers
                            
                                How to load CSV file in logstash
                            
                                Accessing kibana on local network
                            
                                I cannot start logstash on my machine. Error message inside

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With