Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

General guide to diagnose ANR [duplicate]

There are plenty of questions with ANR traces file included and the answer is always "oh, problem is in your thread 76, fix your http call" or something :) But I couldnt find any general guide or tutorial about how to read this traces, step-by-step for any ANR. Is there any? I have few questions in particular:

  1. Is is always possible to see the problem from thread traces I see for real-world ANRs in google console? Or is it possible that there is just no relevant info and I am in bad luck if I cant reproduce the ANR locally?

  2. What threads are included in this information? I suppose there are all threads from my app process, but what about the rest? Are they all in some way relevant for me? (for example threads that some of my threads are waiting for etc.) Or there are also completly unrelated processes?

  3. How google play console determine the "place" where ANR happened - which is then displayed in list of ANRs, for example :

ANR keyDispatchingTimedOut

miesto: com.sample.myapp/myapp.activities.SplashActivity

Because SplashActivity is nowhere to be seen in supplied text of the threads traces.

  1. I know that I should look for threads in WAIT state for potencial deadlocks etc. How about the situation where the thread is "waiting on himself"?

"AsyncTask #1" prio=5 tid=15 WAIT | group="main" sCount=1 dsCount=0 obj=0x41bb50c0 self=0x5529a868 | sysTid=2448 nice=0 sched=0/0 cgrp=apps handle=1429609576 | state=S schedstat=( 18097077 39273309 41 ) utm=1 stm=0 core=1 at java.lang.Object.wait(Native Method) - waiting on <0x41bb5258> (a java.lang.VMThread) held by tid=15 (AsyncTask #1)

Is this always OK and I can assume this is not the cause? What about the situation, where I have only bunch of threads in NATIVE (including main thread) and bunch of threads in WAIT waiting on themselves like this? How can this be ANR?

like image 546
rouen Avatar asked Mar 09 '15 09:03

rouen


People also ask

How is ANR diagnosed?

There are some common patterns to look for when diagnosing ANRs: The app is doing slow operations involving I/O on the main thread. The app is doing a long calculation on the main thread. The main thread is doing a synchronous binder call to another process, and that other process is taking a long time to return.

How do you analyze ANR traces?

Analyse the logs just 5 seconds or 10 seconds before this based on the component it's referring to , to find the root cause. Using traces file: After ANR is recreated pull the traces file via adb pull /data/anr/traces. txt command.

What will cause ANR error?

Application Not Responding (ANR) errors are triggered when the UI thread of the application is not responding for more than 5 seconds.

How do you avoid ANR status?

How to prevent an ANR? Stop doing heavy tasks on the main thread. Instead use worker threads such as IntentService, AsyncTask Handler, or another Thread simply.


1 Answers

The system sends various events to your application, which are received on the UI thread. If that thread doesn't respond to the events within a certain period of time, the system concludes that the app is unresponsive, and initiates the ANR handling.

Addressing your question point by point:

  1. It's not always possible to see the problem in the stack trace. The system server process detects that a problem exists, then signals the problematic process to dump its stack traces. If the app recovered between the problem discovery and stack dump signal, then the traces won't tell you much.

  2. You should see all threads from your app, and your app only. The ANR mechanism does not attempt to determine a set of "relevant" threads. The place to start is the UI thread, usually the app's "main" thread, to see if you have caught it in the act of being stuck. Sometimes the app is slow, not stuck, and the cause of the slowness is actually a different process that is soaking the CPU or disk bandwidth, but you can't see that in a stack trace... and you will likely get a stack trace that reflects execution past the point where it was "stuck".

  3. The "place" is the event that was not responded to (in this case, a key event), and the Activity that the system was attempting to interact with.

  4. That's normal; you'll see that when a thread is "parked" via java.util.concurrent.locks.LockSupport.park() in Dalvik. Remember that the lock is released while the thread is waiting, so in this case it's just waiting for another thread to come along and notify it.

Addressing a point raised in the comments: it's possible for a native crash to cause an ANR if (1) the native crash doesn't kill the app entirely, which is what it's supposed to do; and (2) the thread that died was the UI thread, or held a resource that the UI thread was waiting for. If you don't have access to the full logcat, you can check the thread list to confirm that all of your threads are alive.

When looking at an ANR, the first thing you need to figure out is if it's permanently stuck or just temporarily slowed. This should be obvious to the person using the app. Permanent freezes are usually the easiest to solve, as the stack trace will generally lead you to what went wrong. Start with the UI thread and walk through the trace until you find some bit of code that is spinning or stuck in a native call. (There's a trick with native calls though -- if it says NATIVE then it's still in native code, but if it says SUSPENDED on a thread with a native method at the top of the stack, then it's not stuck, but rather in the act of returning from native to managed code.)

Transient ANRs can be harder, especially if they're happening on customer devices whose configuration is unknown. If they're running CPU benchmarks in the background on a device that's stalling because of a failing flash part, your app is going to have a bad time. Sometimes the stack trace points you in the general direction of the problem (e.g. this one, where it appears slow rendering and coarse locking were stalling the UI thread), other times the trace is captured after the app is back to running normally.

like image 199
fadden Avatar answered Sep 17 '22 12:09

fadden