I'm trying to configure a SpringBoot application to export metrics to InfluxDB to visualise them using a Grafana dashboard. I'm using this dashboard as an example which uses Prometheus as a backend. For some metrics I have no problem figuring out how to create graphs for them but for some others I don't know how to create the graphs or even if it's possible at all. So I enumerate the things I'm not really sure about in the following points:
Is there any documentation where a value unit is described? The application I'm using as an example doesn't have any load on it so sometimes I don't know whether the value is a bit, a byte, a second, a millisecond, a count, etc.
Some measurements contain the tag 'metric_type = histogram' with fields 'count', 'sum', 'mean' and 'upper'. Again, here I don't know what the value units are, what upper means or how I'm suppose to plot them. Examples of this are 'http_server_requests' or 'jvm_gc_pause'.
From what I see in the Grafana dashboard example, it seems I should use these measurements of type histogram to create both a graph with counts and graphs with duration. For example I see I should be able to create a graph with the number of requests and another one with their duration. Or for the garbage collector, I should be able to provide a graph for the number of minor and major GCs and another for their duration.
As an example of measures I get inserted into InfluxDB:
time count exception mean method metric_type outcome status sum upper uri
1625579637946000000 1 None 0.892144 GET histogram SUCCESS 200 0.892144 0.892144 /actuator/health
or
time action cause count mean metric_type sum upper
1625581132316000000 end of minor GC Allocation Failure 1 2 histogram 2 2
I agree the documentation for micrometer is not great. I've had to dig through the code to find answers...
Regarding your questions about jvm_gc_pause, it is a Timer and the implementation is AbstractTimer which is a class that wraps a Histogram among other components. This particular metric is registered by the JvmGcMetrics class. The various measurements that are published to InfluxDB are determined by the InfluxMeterRegistry.writeTimer(Timer timer) method:
timer.totalTime(getBaseTimeUnit()) // The total time of recorded eventstimer.count() // The number of times stop has been called on the timertimer.mean(getBaseTimeUnit()) // totalTime()/count()timer.max(getBaseTimeUnit()) // The max time of a single eventThe base time unit is milliseconds.
Similarly, http_server_requests appears to be a Timer as well.
I believe you are correct that the sensible thing is to chart on two separate Grafana panels: one panel for GC pause seconds using sum (or mean or upper), and one panel for GC events using count.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With