Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fluentd not parsing JSON log file entry

I've seen a number of similar questions on Stackoverflow, including this one. But none address my particular issue.

The application is deployed in a Kubernetes (v1.15) cluster. I'm using a docker image based on the fluent/fluentd-docker-image GitHub repo, v1.9/armhf, modified to include the elasticsearch plugin. Elasticsearch and Kibana are both version 7.6.0.

The logs are going to stdout and look like:

{"Application":"customer","HTTPMethod":"GET","HostName":"","RemoteAddr":"10.244.4.154:51776","URLPath":"/customers","level":"info","msg":"HTTP request received","time":"2020-03-10T20:17:32Z"}

In Kibana I'm seeing something like this:

{
  "_index": "logstash-2020.03.10",
  "_type": "_doc",
  "_id": "p-UZxnABBcooPsDQMBy_",
  "_version": 1,
  "_score": null,
  "_source": {
    "log": "{\"Application\":\"customer\",\"HTTPMethod\":\"GET\",\"HostName\":\"\",\"RemoteAddr\":\"10.244.4.154:46160\",\"URLPath\":\"/customers\",\"level\":\"info\",\"msg\":\"HTTP request received\",\"time\":\"2020-03-10T20:18:18Z\"}\n",
    "stream": "stdout",
    "docker": {
      "container_id": "cd1634b0ce410f3c89fe63f508fe6208396be87adf1f27fa9d47a01d81ff7904"
    },
    "kubernetes": {

I'm expecting to see the JSON pulled from the log: value somewhat like this (abbreviated):

{
  "_index": "logstash-2020.03.10",
  ...
  "_source": {
    "log": "...",   
    "Application":"customer",
    "HTTPMethod":"GET",
    "HostName":"",
    "RemoteAddr":"10.244.4.154:46160",
    "URLPath":"/customers",
    "level":"info",
    "msg":"HTTP request received",
    "time":"2020-03-10T20:18:18Z",
    "stream": "stdout",
    "docker": {
      "container_id": "cd1634b0ce410f3c89fe63f508fe6208396be87adf1f27fa9d47a01d81ff7904"
    },
    "kubernetes": {

My fluentd config is:

match fluent.**>
  @type null
</match>

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source>

<match kubernetes.var.log.containers.**fluentd**.log>
  @type null
</match>
<match kubernetes.var.log.containers.**kube-system**.log>
  @type null
</match>
<filter kubernetes.**>
  @type kubernetes_metadata
</filter>

<match **>
   @type elasticsearch
   @id out_es
   @log_level info
   include_tag_key true
   host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
   port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
   path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}"
   <format>
      @type json
   </format>
</match>

I'm sure I'm missing something. Can anyone point me in the right direction?

Thanks, Rich

like image 258
Rich Avatar asked Jan 26 '23 06:01

Rich


1 Answers

This config worked for me:

<source>
  @type tail
  path /var/log/containers/*.log,/var/log/containers/*.log
  pos_file /opt/bitnami/fluentd/logs/buffers/fluentd-docker.pos
  tag kubernetes.*
  read_from_head true
  <parse>
    @type json
    time_key time
    time_format %iso8601
  </parse>
</source>

<filter kubernetes.**>
  @type parser
  key_name "$.log"
  hash_value_field "log"
  reserve_data true
  <parse>
    @type json
  </parse> 
</filter>

<filter kubernetes.**>
  @type kubernetes_metadata
</filter>

Make sure to edit path so that it matches your use case.

This happens because docker logs in /var/log/containers/*.log put container STDOUT under 'log' key as string, so to put those JSON logs there as strings they must be first serialized to strings. What you need to do is to add an additional step that will parse this string under 'log' key:

<filter kubernetes.**>
  @type parser
  key_name "$.log"
  hash_value_field "log"
  reserve_data true
  <parse>
    @type json
  </parse> 
</filter>
like image 149
pbn Avatar answered Feb 05 '23 16:02

pbn