Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: how to repair a hung thread?

Please note: I'm tagging this with JClouds because if you read the entire question and comments that ensue, I believe this to be either a bug with JClouds or a misuse of that library.

I have an executable JAR that runs, works for a while, finishes the work without throwing any errors/exceptions, and then hangs forever when it should be exiting. I profiled it with VisualVM (paying attention to the running threads), and I also tossed in a log statement to print at the point (at the end of the main() method) where the app hangs. Here is the last part of my main method:

Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for(Thread t : threadSet) {
    String daemon = (t.isDaemon()? "Yes" : "No");
    System.out.println("The ${t.getName()} thread is currently running; is it a daemon? ${daemon}.");
}

When my JAR executes this code, I see the following output:

The com.google.inject.internal.util.Finalizer thread is currently running; is it a daemon? Yes.
The Signal Dispatcher thread is currently running; is it a daemon? Yes.
The RMI Scheduler(0) thread is currently running; is it a daemon? Yes.
The Attach Listener thread is currently running; is it a daemon? Yes.
The user thread 3 thread is currently running; is it a daemon? No.
The Finalizer thread is currently running; is it a daemon? Yes.
The RMI TCP Accept-0 thread is currently running; is it a daemon? Yes.
The main thread is currently running; is it a daemon? No.
The RMI TCP Connection(1)-10.10.99.8 thread is currently running; is it a daemon? Yes.
The Reference Handler thread is currently running; is it a daemon? Yes.
The JMX server connection timeout 24 thread is currently running; is it a daemon? Yes.

I don't think I have to worry about daemons (correct me if I'm wrong), so filtering that to non-daemons:

The user thread 3 thread is currently running; is it a daemon? No.
The main thread is currently running; is it a daemon? No.

Obviously, the main thread is still running because something is preventing it from exiting. Hmmm, user thread 3 looks interesting. What does VisualVM tell us?

enter image description here

This is the thread view at the point that the app was hanging (what was happening while the console output above was printing). Hmmm user thread 3 is looking even more suspicious!

So before killing the app I took a thread dump. Here is the stacktrace for user thread 3:

"user thread 3" prio=6 tid=0x000000000dfd4000 nid=0x2360 waiting on condition [0x00000000114ff000]
    java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000782cba410> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

    Locked ownable synchronizers:
        - None

I've never had to analyze one of these before so it means gibberish to me (but perhaps not to a trained eye!).

After killing the app, VisualVM's timeline stops ticking/incrementing every second, and I can scroll horizontally backwards in the timeline to where user thread 3 was created and began it's life as a nagging thread:

enter image description here

However I cannot figure out how to tell where in the code user thread 3 is being created. So I ask:

  • How can I tell what is creating user thread 3, and where (especially since I suspect its a 3rd party OSS lib that is creating the thread) it is being created?
  • How can I triage, diagnose and fix this thread hanging?

Update:

Here is my code that is firing around the same time that user thread 3 seems to be getting created:

ExecutorService myExecutor = Executors.newCachedThreadPool();
for(Node node : nodes) {
    BootstrapAndKickTask bootAndKickTask = new BootstrapAndKickTask(node, ctx);
    myExecutor.execute(bootAndKickTask);
}

myExecutor.shutdown();
if(!myExecutor.awaitTermination(15, TimeUnit.MINUTES)) {
    TimeoutException toExc = new TimeoutException("Hung after the 15 minute timeout was reached.");
    log.error(toExc);

    throw toExc;
}

Also here is my GitHub Gist which contains the full thread dump.

like image 476
DirtyMikeAndTheBoys Avatar asked Jan 10 '23 22:01

DirtyMikeAndTheBoys


2 Answers

What appears to be happening, but I can't confirm without code, is that you are forgetting to call shutdown()/shutdownNow() on an ExecutorService. You are leaving, what appears to be, a ThreadPoolExecutor object globally reachable and still running when your main thread exits. Since it's still globally reachable the ExecutorService will never have it's finalize method called and will never shut its self down. By default, Threads created for an ExecutorService, are created as non-daemon, and will happily continue running long after it's needed.

You should either provide code for us to look at, or look through your code to where you use a ThreadPoolExecutor, and properly shut it down after you are done using it.

According to the docs:

A pool that is no longer referenced in a program AND has no remaining threads will be shutdown automatically. If you would like to ensure that unreferenced pools are reclaimed even if users forget to call shutdown(), then you must arrange that unused threads eventually die, by setting appropriate keep-alive times, using a lower bound of zero core threads and/or setting allowCoreThreadTimeOut(boolean).

this means that even if your program no longer has a reference to a ThreadPoolExecutor, it will never be reclaimed as long as at least one Thread remains alive in the pool. You can check the docs for ways around this.

like image 184
Smith_61 Avatar answered Jan 12 '23 11:01

Smith_61


It would be good if you could paste the entire code you use. Apache jclouds uses a couple executors to perform certain tasks, and you have to close them.

Make sure you call the close() method on the context or api you get from the jclouds ContextBuilder.

like image 28
Ignasi Barrera Avatar answered Jan 12 '23 11:01

Ignasi Barrera