Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reporting JVM's CPU usage with Dropwizard metrics

I use Dropwizard metrics to measure various metrics in my application. They are several predefined reporters in JVM instrumentation, but strangely I could not find any reporting the CPU usage.

I could create my own Gauge (using getThreadCpuTime or similar), but my best guess is that I am missing something.

Did I miss it in the current implementation, or is it more complex than I first think ?

like image 734
Benoît Avatar asked Oct 17 '16 15:10

Benoît


1 Answers

I don't know much about Dropwizard, but I've used ThreadMXBean in the past to provide estimates of CPU utilization in scalable distributed computing systems so I'll share what I think is relevant to the question. Things are definitely more complicated than they may first appear:

ThreadMxBean is somewhat misleading ...

ThreadMxBean.getThreadCpuTime(id) only returns the total time that a particular thread has spent executing code on the CPU, measured in nanoseconds, since the thread started. It provides no information on how long your thread may have been blocked or waited (sleeping), so it really doesn't give you a good idea of CPU usage. You need to also measure total blocked/waited time, and then keep track of all three of those values over the runtime of your program to track CPU usage. Oddly enough, ThreadMXBean has no methods to directly obtain blocked/waited time, so you may be tempted to give up.

... but you can use it to get a ThreadInfo object ...

First, to enable this, call these two lines (this may throw an exception if your JVM doesn't support it):

ManagementFactory.getThreadMXBean().setThreadCpuTimeEnabled(true);
ManagementFactory.getThreadMXBean().setThreadContentionMonitoringEnabled(true);

Now you can call ThreadMXBean.getThreadInfo(threadId) to get an instance of ThreadInfo corresponding to a particular thread. This info object has two methods, getBlockedTime() and getWaitedTime(), which return the total number of milliseconds your thread has spent in either of those states. There is no getCpuTime() method (which, if you ask me, is a tremendously silly shortcoming of this object), but if you know when your thread was started, you can do something like this:

//Initialized somewhere else:
ThreadMXBean bean = ...
long threadStartTime = System.currentTimeMillis();
Thread myThread = ...

//Inside your metrics-gathering code:
long now = System.currentTimeMillis();
ThreadInfo info = bean.getThreadInfo(myThread.getId());
long totalCpuTime = now - (info.getBlockedTime()+info.getWaitedTime()+threadStartTime);

Now you can compute Thread utilization as a percentage.

We're almost there, but we're not quite done yet. Each time we go through the final three lines of the code I posted above, we're only gathering total times for executing/blocked/waiting states of our thread. To compute a percentage, we need to keep track of when we gathered these metrics so we can know how much time the thread spent in each of those states since the last metrics update. So, do something like this:

class ThreadUsageMetrics{
    long timestamp, totalBlockedTime, totalWaitTime;

    ThreadUsageMetrics(long ts, long blocked, long wait){
        timestamp = ts;
        totalBlockedTime = blocked;
        totalWaitTime = wait;
    }

    double computeCpuUsageSince(ThreadUsageMetrics prev){
        long time = timestamp - prev.timestamp;
        long blocked = totalBlockedTime - prev.totalBlockedTime;
        long waited = totalWaitTime - prev.totalWaitTime;
        return (time-(blocked+waited))/(double)time;
    }
}

This will give us a double on the range from 0.0 to 1.0 indicating CPU usage as a percentage of total time since the last metrics update. I'm assuming you can convert this value into a percentage and feed it to an instance of Dropwizard's Gauge every 5 seconds or so. On my project, this is how we have estimated CPU usage for several years and it's worked great for us.

A couple of notes on this - We don't actually need to explicitly store total CPU time in this object because any time not spent blocking or waiting is either execution time, or spent during context switching. We have no way to know context switch time, but it's safe to assume that total context switching time is negligible for 99.9% of all cases.

Here's the caveat - we aren't truly measuring CPU usage.

If you've read carefully, you'll notice I've said we're "estimating" CPU usage. The reason I say this is that we're measuring total execution time of a particular Java Thread. Java provides no concept of actual CPU hardware usage - it's merely the total time a thread has spent executing. This is further muddied by things like Hyper Threading, where time spent "executing" may actually mean time spent waiting for the other thread to get off the ALU or memory bus. I think this provides a good measure of when code is running on a physical hardware Thread, but if you're wanting to measure actual CPU usage, you won't be able to do it in pure Java.

like image 81
CodeBlind Avatar answered Nov 03 '22 18:11

CodeBlind