We have IoT devices which are mostly connected well to the internet, but there is a possibility that the network goes down. For this case, the device itself will do the right thing (while it cannot be actively controlled any more). We would still like to get metrics data for the time in which the network is down.
It means a device-local telegraf would need to collect the metrics data, store it and check on the network connection. If the network is up (again), then forward to a influxDB for example.
Is it possible to achieve this scenario with Telegraf/InfluxDB or prometheus?
Telegraf can not store metrics on the local drive in the case of a failure. However, in can buffer unsuccessfully sent metrics (I believe in RAM) and flush the buffer on the successful write. Take a look at metric_buffer_limit
option in Telegraf config:
# Configuration for telegraf agent
[agent]
## For failed writes, telegraf will cache metric_buffer_limit metrics for each
## output, and will flush this buffer on a successful write. Oldest metrics
## are dropped first when this buffer fills.
## This buffer only fills when writes fail to output plugin(s).
metric_buffer_limit = 10000
This way, as far as you don't overflow this buffer, metrics collected while InfluxDB is down will still be preserved and resent later.
Edit: you can track a similar feature request here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With