Using NLog with Elasticsearch target to forward logs to AWS Elasticsearch as a Service cluster for visualisations in Kibana.
This works fine but I am concerned about using this in production due to ES cluster availability and the impact a cluster failover has, when the logs are sent using the elasticsearch-net client via HTTP.
I am considering using a different target for NLog that sends the logs to a more reliable destination (File, S3 ?) and then having something else (Logstash, AWS Lambda) pick them up and sending them to ES, this way minimising risks on the application itself.
Would like to hear your thoughts
UPDATE
Main concern is app availability and to prevent missing logs secondary target is used.
Using latest NLog and throwExceptions is set to false and not using async targets at this point but considering this as we have a lot of async code.
To give a bit more context the "app" is a set of APIs (WebAPI and WCF) which get 10 - 15K RPM.
Scenario
Request comes in and ES cluster is unavailable.
Case 1 - NLog without async target
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"
autoReload="true"
throwExceptions="false"
internalLogLevel="Off"
internalLogFile="c:\temp\nlog-internal.log">
<targets>
<target name="elastic"
xsi:type="BufferingWrapper"
flushTimeout="5000">
<target xsi:type="ElasticSearch"
layout="${logger} | ${threadid} | ${message}"
index="logstash-${date:format=yyyy.MM.dd}"
includeAllProperties="true"
uri="...">
<field name="user"
layout="${windows-identity:userName=True:domain=False}"/>
<field name="host"
layout="${machinename}"/>
<field name="number"
layout="1"
layoutType="System.Int32"/>
</target>
</target>
</targets>
<rules>
<logger name="*"
minlevel="Debug"
writeTo="elastic" />
</rules>
</nlog>
Q:
Case 2 - NLog with async target
Using async wrapper for elasticsearch target with queueLimit="10000" batchSize="100"
Q:
Often referred to as Elasticsearch, the ELK stack gives you the ability to aggregate logs from all your systems and applications, analyze these logs, and create visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics, and more.
Elasticsearch is a great tool for observability data (logs, metrics, and APM data). Elastic's tooling makes a lot of this really easy in most cases.
ELK is a collection of three open-source applications - Elasticsearch, Logstash, and Kibana from Elastic that accepts data from any source or format, on which you can then perform search, analysis, and visualize that data. Elasticsearch — Elasticsearch stores and indexes the data.
Good question.
There is nothing to worry about, but correct configuration of NLog is important.
Not sure what should be reliable, running the program or not losing a log message, so for those cases:
If you are afraid if you lose some log messages
If you afraid that logging could break your application:
throwExceptions
(disabled by default)async
, the errors are written to the target in another thread, so it could not break your app. async
, check the overflow and queue settings
Update
Case 1,
what happens with the main thread when target can't be reached?
Nothing. The main queues the messages in a buffer. Another (Timer
) thread is processing those messages. If that will fail, and throwException
is not enabled, only errors will be written to the internalLog (when enabled). All exceptions will be caught. You will lose the message when writing to the target fails.
Case 2,
is another thread[B] created ?
One Timer
will be created. This will create a thread for processing the message.
will subsequent requests reuse thread [B] and queue the logging requests?
Yes, but no guarantee it will the same thread. The timer will create a thread from the pool. NB: only one thread will be alive concurrently.
what happens when the queueLimit is reached?
Depends of your configuration. By default it will discard by default as stated above. See check the overflow/queue settings. This is the safest option in terms of memory and CPU. You could choose to discard, block (stops the main thread), or grown the queue (by aware of memory usage).
will additional threads [B1 ... Bn] be started? (this will flood connection pool)
No. 1 Timer, 1 threadpool. For details check the MSDN page for Timer, or the reference source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With