Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiline fluentd logs in kubernetes

I am new to fluentd. I have configured the basic fluentd setup I need and deployed this to my kubernetes cluster as a daemon set. I'm seeing logs shipped to my 3rd party logging solution. However I now want to deal with some logs that are coming in as multiple entries when they really should be one. The logs from the node look like they are json and are formatted like

{"log":"2019-09-23 18:54:42,102 [INFO] some message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"}
{"log": "another message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"}

I have a config map that looks like

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-config-map
  namespace: logging
  labels:
    k8s-app: fluentd-logzio
data:
  fluent.conf: |-
@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
@include kubernetes.conf
@include conf.d/*.conf

<match fluent.**>
    # this tells fluentd to not output its log on stdout
    @type null
</match>

# here we read the logs from Docker's containers and parse them
<source>
  @id fluentd-containers.log
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag raw.kubernetes.*
  format json
  read_from_head true

</source>

# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
  @id raw.kubernetes
  @type detect_exceptions
  remove_tag_prefix raw
  message log
  stream stream
  multiline_flush_interval 5
  max_bytes 500000
  max_lines 1000
</match>

# Enriches records with Kubernetes metadata
<filter kubernetes.**>
  @id filter_kubernetes_metadata
  @type kubernetes_metadata
</filter>

<match kubernetes.**>
  @type logzio_buffered
  @id out_logzio
  endpoint_url "https://listener-ca.logz.io?token=####"
  output_include_time true
  output_include_tags true
  <buffer>
    # Set the buffer type to file to improve the reliability and reduce the memory consumption
    @type file
    path /var/log/fluentd-buffers/stackdriver.buffer
    # Set queue_full action to block because we want to pause gracefully
    # in case of the off-the-limits load instead of throwing an exception
    overflow_action block
    # Set the chunk limit conservatively to avoid exceeding the GCL limit
    # of 10MiB per write request.
    chunk_limit_size 2M
    # Cap the combined memory usage of this buffer and the one below to
    # 2MiB/chunk * (6 + 2) chunks = 16 MiB
    queue_limit_length 6
    # Never wait more than 5 seconds before flushing logs in the non-error case.
    flush_interval 5s
    # Never wait longer than 30 seconds between retries.
    retry_max_interval 30
    # Disable the limit on the number of retries (retry forever).
    retry_forever true
    # Use multiple threads for processing.
    flush_thread_count 2
  </buffer>
</match>

My question is how do I get these log messages shipped as one single entry instead of multiple?

like image 283
Matthew The Terrible Avatar asked Sep 23 '19 20:09

Matthew The Terrible


People also ask

How does Fluentd collect logs from Kubernetes?

Fluentd is deployed as a daemonset in your Kubernetes cluster and will collect the logs from our various pods. The logs will be processed by Fluentd by adding the context, modifying the structure of the logs and then forwarding it to log storage. The configuration file will be stored in a configmap.

Where are Fluentd logs?

Look at Logs If things are not happening as expected, please first look at your logs. For td-agent (rpm/deb), the logs are located at /var/log/td-agent/td-agent.

Which function does Fluentd fulfill for application logging in Kubernetes?

Fluentd helps you to centralize log information of running applications with Kubernetes metadata and route the information to desired destinations such as ElasticSearch or AWS S3. In this post, I will share how Fluentd works with example Kubernetes and EFK(ElasticSearch/Fluentd/Kibana) stack configuration.

What is Fluentd in Kubernetes?

Fluentd is a popular open-source data collector that we'll set up on our Kubernetes nodes to tail container log files, filter and transform the log data, and deliver it to the Elasticsearch cluster, where it will be indexed and stored.


1 Answers

There are at least two ways:

multiline plugin

Thanks to @rickerp, he suggested multiline plugin.

The multiline parser plugin parses multiline logs. This plugin is the multiline version of regexp parser.

The multiline parser parses log with formatN and format_firstline parameters. format_firstline is for detecting the start line of the multiline log. formatN, where N's range is [1..20], is the list of Regexp format for multiline log.

Unlike other parser plugins, this plugin needs special code in input plugin e.g. handle format_firstline. So, currently, in_tail plugin works with multiline but other input plugins do not work with it.

fluent-plugin-concat plugin

As per fluentd documentation, fluent-plugin-concat solves this:

Concatenate multiple lines log messages

Application log is stored into "log" field in the records. You can concatenate these logs by using fluent-plugin-concat filter before send to destinations.

<filter docker.**>
@type concat
key log
stream_identity_key container_id
multiline_start_regexp /^-e:2:in `\/'/
multiline_end_regexp /^-e:4:in/
</filter>

Original events:

2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'"}
2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"source":"stdout","log":"-e:2:in `do_division_by_zero'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"}
2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"source":"stdout","log":"-e:4:in `<main>'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"}

Filtered events:

2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'\n-e:2:in `do_division_by_zero'\n-e:4:in `<main>'"}

With the plugin, you'll want to fix some regexes.

like image 81
Yasen Avatar answered Oct 18 '22 03:10

Yasen