Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Field from event by pattern

So I'm using a standard ELK stack to analyse Apache access logs, which is working well, but I'm looking to break out URL parameters as fields, using the KV filter, in order to allow me to write better queries.

My problem is that that app I'm analysing has 'cache-busting' dynamically generated parameters, which leads to tens of thousands of 'fields', each occurring once. ElasticSearch seems have severe trouble with this and they have no value to me, so I'd like to remove them. Below is an example of the pattern

GET /page?rand123PQY=ABC&other_var=something GET /page?rand987ZDQ=DEF&other_var=something

In the example above, the parameters I want to remove start 'rand'. Currently my logstash.conf uses grok to extract fields from the access logs, followed by kv to extract Query string parameters:

filter { grok { path => "/var/log/apache/access.log" type => "apache-access" } kv { field_split => "&?" } } Is there a way I can filter out any fields matching the pattern rand[A-Z0-9]*=[A-Z0-9]*? Most examples I've seen are targeting fields by exact name, which I cannot use. I did wonder about regexing the request field into a new field, running KV on that, then removing it. Would that work?

like image 201
barnyr Avatar asked Dec 19 '22 08:12

barnyr


2 Answers

If the set of fields that you are interested in is known and well-defined you could set target for the kv filter, move the interesting fields to the top level of the message with a mutate filter and delete the field with the nested key/value pairs. I think this is pretty much what you suggested at the end.

Alternatively you could use a ruby filter:

filter {
  ruby {
    code => "
      event.to_hash.keys.each { |k|
        if k.start_with?('rand')
          event.remove(k)
        end
      }
    "
  }
}
like image 193
Magnus Bäck Avatar answered Jan 01 '23 03:01

Magnus Bäck


I know this is dated and has been answered, but for anyone looking into it as of 2017. There's a plugin named prune that allows you to trim based on difference criteria including patterns.

prune {
    blacklist_names => ["[0-9]+", "unknown_fields", "tags"]
}
like image 27
Kelvin Avatar answered Jan 01 '23 03:01

Kelvin