I am new to fluentd. I have configured the basic fluentd setup I need and deployed this to my kubernetes cluster as a daemon set. I'm seeing logs shipped to my 3rd party logging solution. However I now want to deal with some logs that are coming in as multiple entries when they really should be one. The logs from the node look like they are json and are formatted like
{"log":"2019-09-23 18:54:42,102 [INFO] some message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"}
{"log": "another message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"}
I have a config map that looks like
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-config-map
namespace: logging
labels:
k8s-app: fluentd-logzio
data:
fluent.conf: |-
@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
@include kubernetes.conf
@include conf.d/*.conf
<match fluent.**>
# this tells fluentd to not output its log on stdout
@type null
</match>
# here we read the logs from Docker's containers and parse them
<source>
@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag raw.kubernetes.*
format json
read_from_head true
</source>
# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
@id raw.kubernetes
@type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
@id filter_kubernetes_metadata
@type kubernetes_metadata
</filter>
<match kubernetes.**>
@type logzio_buffered
@id out_logzio
endpoint_url "https://listener-ca.logz.io?token=####"
output_include_time true
output_include_tags true
<buffer>
# Set the buffer type to file to improve the reliability and reduce the memory consumption
@type file
path /var/log/fluentd-buffers/stackdriver.buffer
# Set queue_full action to block because we want to pause gracefully
# in case of the off-the-limits load instead of throwing an exception
overflow_action block
# Set the chunk limit conservatively to avoid exceeding the GCL limit
# of 10MiB per write request.
chunk_limit_size 2M
# Cap the combined memory usage of this buffer and the one below to
# 2MiB/chunk * (6 + 2) chunks = 16 MiB
queue_limit_length 6
# Never wait more than 5 seconds before flushing logs in the non-error case.
flush_interval 5s
# Never wait longer than 30 seconds between retries.
retry_max_interval 30
# Disable the limit on the number of retries (retry forever).
retry_forever true
# Use multiple threads for processing.
flush_thread_count 2
</buffer>
</match>
My question is how do I get these log messages shipped as one single entry instead of multiple?
Fluentd is deployed as a daemonset in your Kubernetes cluster and will collect the logs from our various pods. The logs will be processed by Fluentd by adding the context, modifying the structure of the logs and then forwarding it to log storage. The configuration file will be stored in a configmap.
Look at Logs If things are not happening as expected, please first look at your logs. For td-agent (rpm/deb), the logs are located at /var/log/td-agent/td-agent.
Fluentd helps you to centralize log information of running applications with Kubernetes metadata and route the information to desired destinations such as ElasticSearch or AWS S3. In this post, I will share how Fluentd works with example Kubernetes and EFK(ElasticSearch/Fluentd/Kibana) stack configuration.
Fluentd is a popular open-source data collector that we'll set up on our Kubernetes nodes to tail container log files, filter and transform the log data, and deliver it to the Elasticsearch cluster, where it will be indexed and stored.
There are at least two ways:
multiline
pluginThanks to @rickerp, he suggested multiline
plugin.
The multiline parser plugin parses multiline logs. This plugin is the multiline version of regexp parser.
The multiline parser parses log with formatN and format_firstline parameters. format_firstline is for detecting the start line of the multiline log. formatN, where N's range is [1..20], is the list of Regexp format for multiline log.
Unlike other parser plugins, this plugin needs special code in input plugin e.g. handle format_firstline. So, currently, in_tail plugin works with multiline but other input plugins do not work with it.
fluent-plugin-concat
pluginAs per fluentd
documentation, fluent-plugin-concat
solves this:
Concatenate multiple lines log messages
Application log is stored into
"log"
field in the records. You can concatenate these logs by using fluent-plugin-concat filter before send to destinations.<filter docker.**> @type concat key log stream_identity_key container_id multiline_start_regexp /^-e:2:in `\/'/ multiline_end_regexp /^-e:4:in/ </filter>
Original events:
2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'"} 2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"source":"stdout","log":"-e:2:in `do_division_by_zero'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"} 2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"source":"stdout","log":"-e:4:in `<main>'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"}
Filtered events:
2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'\n-e:2:in `do_division_by_zero'\n-e:4:in `<main>'"}
With the plugin, you'll want to fix some regexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With