Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intercepting crashes on iOS

Tags:

logging

ios

swift

Description

I would like to catch all exceptions that are occurring in iOS app and log them to file and eventually send them to back-end server used by the app.

I've been reading about this topic and found usage of signals sent by device and handling them, but I'm not sure if it's gonna break App Store Review guidelines or it may introduce additional issues.

I've added following to AppDelegate:

NSSetUncaughtExceptionHandler { (exception) in  
    log.error(exception)  
}  

signal(SIGABRT) { s in  
    log.error(Thread.callStackSymbols.prettified())  
    exit(s)  
}  

signal(SIGILL) { s in  
    log.error(Thread.callStackSymbols.prettified())  
    exit(s)  
}  

signal(SIGSEGV) { s in  
    log.error(Thread.callStackSymbols.prettified())  
    exit(s)  
}

Questions

  • Is this good approach, any other way?
  • Will it break App Store Review guidelines because of usage of exit()
  • Is it better to use kill(getpid(), SIGKILL) instead of exit()?

Resources

  • https://github.com/zixun/CrashEye/blob/master/CrashEye/Classes/CrashEye.swift
  • https://www.plcrashreporter.org/
  • https://chaosinmotion.blog/2009/12/02/a-useful-little-chunk-of-iphone-code/
like image 795
Najdan Tomić Avatar asked Nov 27 '22 06:11

Najdan Tomić


2 Answers

former Crashlytics iOS SDK maintainer here.

The code you've written above does have a number of technical issues.

The first is there are actually very few functions that are defined as safe to invoke inside a signal handler. man sigaction lists them. The code you've written is not signal-safe and will deadlock from time to time. It all will depend on what the crashed thread is doing at the time.

The second is you are attempting to just exit the program after your handler. You have to keep in mind that signals/exception handlers are process-wide resources, and you might not be the only one using them. You have to save pre-existing handlers and then restore them after handling. Otherwise, you can negatively affect other systems the app might be using. As you've currently written this, even Apple's own crash reporter will not be invoked. But, perhaps you want this behavior.

Third, you aren't capturing all threads stacks. This is critical information for a crash report, but adds a lot of complexity.

Fourth, signals actually aren't the lowest level error system. Not to be confused with run time exceptions (ie NSException) mach exceptions are the underlying mechanism used to implement signals on iOS. They are a much more robust system, but are also far more complex. Signals have a bunch of pitfalls and limitations that mach exceptions get around.

These are just the issues that come to me off the top of my head. Crash reporting is tricky business. But, I don't want you to think it's magic, of course it's not. You can build a system that works.

One thing I do want to point out, is that crash reporters give you no feedback on failure. So, you might build something that works 25% of the time, and because you are only seeing valid reports, you think "hey, this works great!". Crashlytics had to put in effort over many years to identify the causes of failure and try to mitigate them. If this is all interesting to you, you can check out a talk I did about the Crashlytics system.

Update:

So, what would happen if you ship this code? Well, sometimes you'll get useful reports. Sometimes, your crash handling code will itself crash, which will cause an infinite loop. Sometimes your code will deadlock, and effectively hang your app.

Apple has made exit public API (for better or worse), so you are absolutely within the rules to use it.

I would recommend continuing down this road for learning purposes only. If you have a real app that you care about, I think it would be more responsible to integrate an existing open-source reporting system and point it to a backend server that you control. No 3rd parties, but also no need to worry about doing more harm than good.

like image 50
Mattie Avatar answered Dec 09 '22 17:12

Mattie


Conclusion

It is possible to create custom crash reporter but it is definitely not recommended because there is a lot going on in background that could be easily forgotten and can introduce a lot of undefined behaviors. Even usage of third party frameworks can be troublesome but it is generally better way to go.

Thanks to everyone for providing information regarding this topic.

Answers to questions

Is this good approach, any other way?

Approach I mentioned in original question will have influence on Apple's own crash reporter and it introduces undefined behavior because of bad handling of signals. UNIX signals are not covering every error and API handling work with async signal safe functions. Mach exception handling which is used by Apple's crash reporter is better option but it is more complex.

Will usage of exit() break Apple App Store review?

No. Usage of exit() is more related to the normal operation of app. If app is crashing anyway, calling exit() isn't problem.

Is it better to use kill(getpid(), SIGKILL) instead of exit()?

Quote from Eskimo:

You must not call exit. There’s two problems with doing that:

exit is not async signal safe. In fact, exit can run arbitrary code via handlers registered with atexit. If you want to exit the process, call _exit.

Exiting the process is a bad idea anyway, because it will either prevent the Apple crash reporter from running or cause it to log incorrect state (the state of your signal handler rather than the state of the crashed thread).

A better solution is to unregister your signal handler (set it to SIG_DFL) and then return


Additional details (full context)

Since I cross posted this questions to Apple's official support forum and got really long and descriptive answer from well known Eskimo I would like to share it with anyone who decides to go same path as I did and starts researching this approach.

Quote from Eskimo

Before we start I’d like you to take look at my shiny new Implementing Your Own Crash Reporter post. I’ve been meaning to write this up for a while, and your question has give me a good excuse to allocate the time.

You wrote:

I've got a requirement to catch all exceptions that are occuring in iOS app and log them to file and eventually send them to back-end server used by the app.

I strongly recommend against doing this. My Implementing Your Own Crash Reporter post explains why this is so hard. It also has some suggestions for how to avoid problems, but ultimately there’s no way to implement a third-party crash reporter that’s reliable, binary compatible, and sufficient to debug complex problems

With that out of the way, let’s look at your specific questions:

Is this good approach at all?

No. The issue is that your minimalist crash reporter will disrupt the behaviour of the Apple crash reporter. The above-mentioned post discusses this problem in gory detail.

Will it break App Store Review guidelines because of usage of exit()?

No. iOS’s prohibition against calling exit is all about the normal operation of your app. If your app is crashing anyway, calling exit isn’t a problem.

However, calling exit will exacerbate the problem I covered in the previous point.

Is it better to use kill(getpid(), SIGKILL) instead?

That won’t improve things substantially.

callStackSymbols are not symbolicated, is there a way to symbolicate callStackSymbols?

No. On-device symbolication is extremely tricky and should be avoided. Again, I go into this in detail in the post referenced above.

Share and Enjoy


Since links can break I will also quote post.

Implementing Your Own Crash Reporter

I often get questions about third-party crash reporting. These usually show up in one of two contexts:

  • Folks are trying to implement their own crash reporter.
  • Folks have implemented their own crash reporter and are trying to debug a problem based on the report it generated.

This is a complex issue and this post is my attempt to untangle some of that complexity.

If you have a follow-up question about anything I've raised here, please start a new thread in .

IMPORTANT All of the following is my own direct experience. None of it should be considered official DTS policy. If you have questions that need an official answer (perhaps you’re trying to convince your boss that implementing your own crash reporter is a very bad idea :-), you should open a DTS tech support incident and we can discuss things there.

Share and Enjoy — Quinn “The Eskimo!” Apple Developer Relations, Developer Technical Support, Core OS/Hardware let myEmail = "eskimo" + "1" + "@apple.com"


Scope

First, I can only speak to the technical side of this issue. There are other aspects that are beyond my remit:

  • I don’t work for App Review, and only they can give definitive answers about what will or won’t be allowed on the store.
  • Doing your own crash reporter has significant privacy implications.

IMPORTANT If you implement your own crash reporter, discuss the privacy impact with a lawyer.

This post assumes that you are implementing your own crash reporter. A lot of folks use a crash reporter from another third party. From my perspective these are the same thing. If you use a custom crash reporter, you are responsible for its behaviour, both good and bad, regardless of where the actual code came from.

Note If you use a crash reporter from another third party, run the tests outlined in Preserve the Apple Crash Report to verify that it’s working well.

General Advice

I strongly advise against implementing your own crash reporter. It’s very easy to implement a basic crash reporter that works well enough to debug simple problems. It’s impossible to create a good crash reporter, one that’s reliable, binary compatible, and sufficient to debug complex problems.

“Impossible?”, I hear you ask, “That’s a very strong word for Quinn to use. He’s usually a lot more circumspect.” And yes, that’s true, I usually am more circumspect, but in this case I’m extremely confident of this conclusion.

There are two fundamental problems with implementing your own crash reporter:

  • On iOS (and the other iOS-based platforms, watchOS and tvOS) your crash reporter must run inside the crashed process. That means it can never be 100% reliable. If the process is crashing then, by definition, it’s in an undefined state. Attempting to do real work in that state is just asking for problems 1.

  • To get good results your crash reporter must be intimately tied to system implementation details. These can change from release to release, which invalidates the assumptions made by your crash reporter. This isn’t a problem for the Apple crash reporter because it ships with the system. However, a crash reporter that’s built in to your product is always going to be brittle.

    I’m speaking from hard-won experience here. I worked for DTS during the PowerPC-to-Intel transition, and saw a lot of folks with custom crash reporters struggle through that process.

Still, this post exists because lots of folks ignore my general advice, so the subsequent sections contain advice about specific technical issues.

WARNING Do not interpret any of the following as encouragement to implement your own crash reporter. I strongly advise against that. However, if you ignore my advice then you should at least try to minimise the risk, which is what the rest of this document is about.

1 On macOS it’s possible for your crash reporter to run out of process, just like the Apple crash reporter. However, that presents its own problems: When running out of process you can’t access various bits of critical state for the crashed process without being tightly bound to implementation details that are not considered API.

Preserve the Apple Crash Report

You must ensure that your crash reporter doesn’t disrupt the Apple crash reporter. Some fraction of your crashes will not be caused by your code but by problems in framework code, and a poorly written crash reporter will disrupt the Apple crash reporter and make it harder to diagnose those issues.

Additionally, when dealing with really hard-to-debug problems, you really need the more obscure info that’s shown in the Apple crash report. If you disrupt that info, you end up making the hard problems harder.

To avoid these issues I recommend that you test your crash reporter’s impact on the Apple crash reporter. The basic idea is:

  1. Create a program that generates a set of specific crashes.
  2. Run through each crash.
  3. Verify that your crash reporter produces sensible results.
  4. Verify that the Apple crash reporter also produces sensible results.

With regards step 1, your test suite should include:

  • An un-handled language exception thrown by your code
  • An un-handled language exception thrown by the OS (accessing an NSArray out of bounds is an easy way to get this)
  • A memory access exception
  • An illegal instruction exception
  • A breakpoint exception

Make sure to test all of these cases on both the main thread and a secondary thread.

With regards step 4, check that the resulting Apple crash report includes correct values for:

  • The exception info
  • The crashed thread
  • That thread’s state
  • Any application-specific info, and especially the last exception backtrace

Signals

Many third-party crash reporters use UNIX signals to catch the crash. This is a shame because using Mach exception handling, the mechanism used by the Apple crash reporter, is generally a better option. However, there are two reasons to favour UNIX signals over Mach exception handling:

  • On iOS-based platforms your crash reporter must run in-process, and doing in-process Mach exception handling is not feasible.
  • Folks are a lot more familiar with UNIX signals. Mach exception handling, and Mach messaging in general, is pretty darned obscure.

If you use UNIX signals for your crash reporter, be aware that this API has some gaping pitfalls. First and foremost, your signal handler can only use async signal safe functions 1. You can find a list of these functions in the sigaction man page 2.

WARNING This list does not include malloc. This means that a crash reporter’s signal handler cannot use Objective-C or Swift, as there’s no way to constrain how those language runtimes allocate memory. That means you’re stuck with C or C++, but even there you have to be careful to comply with this constraint.

The Operative: It’s worse than you know.

Many crash reports use functions like backtrace (see its man page) to get a backtrace from their signal handler. There’s two problems with this:

  • backtrace is not an async signal safe function.
  • backtrace uses a naïve algorithm that doesn’t deal well with cross signal handler stack frames [3].

The latter example is particularly worrying, because it hides the identity of the stack frame that triggered the signal.

If you’re going to backtrace out of a signal, you must use the crashed thread’s state (accessible via the handlers uapparameter) to start your backtrace.

Apropos that, if your crash reporter wants to log the state of the crashed thread, that’s the place to get it.

Finally, there’s the question of how to exit from your signal handler. You must not call exit. There’s two problems with doing that:

  • exit is not async signal safe. In fact, exit can run arbitrary code via handlers registered with atexit. If you want to exit the process, call _exit.
  • Exiting the process is a bad idea anyway, because it will either prevent the Apple crash reporter from running or cause it to log incorrect state (the state of your signal handler rather than the state of the crashed thread).

A better solution is to unregister your signal handler (set it to SIG_DFL) and then return. This will cause the crashed process to continue execution, crash again, and generate a crash report via the Apple crash reporter.

1 While the common signals caught by a crash reporter are not technically async signals (except SIGABRT), you still have to treat them as async signals because they can occur on any thread at any time.

2 It’s reasonable to extend this list to other routines that are implemented as thin shims on a system call. For example, I have no qualms about calling vm_read (see below) from a signal handler.

[3] Cross signal handler stack frames are pushed on to the stack by the kernel when it runs a signal handler on a thread. As there’s no API to learn about the structure of these frames, there’s no way to backtrace across one of these frames in isolation. I’m happy to go into details but it’s really not relevant to this discussion. If you’re interested, start a new thread in and we can chat there.

Reading Memory

A signal handler must be very careful about the memory it touches, because the contents of that memory might have been corrupted by the crash that triggered the signal. My general rule here is that the signal handler can safely access:

  • Its code
  • Its stack
  • Its arguments
  • Immutable global state

In the last point, I’m using immutable to mean immutable after startup. I think it’s reasonable to set up some global state when the process starts, before installing your signal handler, and then rely on it in your signal handler.

Changing any global state after the signal handler is installed is dangerous, and if you need to do that you must be careful to ensure that your signal handler sees a consistent state, even though a crash might occur halfway through your change.

Note that you can’t protect this global state with a mutex because mutexes are not async signal safe (and even if they were you’d deadlock if the mutex was held by the thread that crashed). You should be able to use atomic operations for this, but atomic operations are notoriously hard to use correctly (if I had a dollar for every time I’ve pointed out to a developer they’re using atomic operations incorrectly, I’d be very badly paid (-: but that’s still a lot of developers!).

If your signal handler reads other memory, it must take care to avoid crashing while doing that read. There’s no BSD-level API for this 1, so I recommend that you use vm_read.

1 The traditional UNIX approach for doing this is to install a signal handler to catch any memory exceptions triggered by the read, but now we’re talking signal handling within a signal handler and that’s just silly.

Writing Files

If your want to write a crash report from your signal handler, you must use low-level UNIX APIs (open, write, close) because only those low-level APIs are documented to be async signal safe. You must also set up the path in advance because the standard APIs for determining where to write the file (NSFileManager, for example) are not async signal safe.

Offline Symbolication

Do not attempt to do symbolication from your signal handler. Rather, write enough information to your crash report to support offline symbolication. Specifically:

  • The addresses to symbolicate
  • For each Mach-O image in the process:
    • The image path
    • The image UUID
    • The image load address

You can get most of the Mach-O image information using the APIs in <mach-o/dyld.h> 1. Be aware, however, that these APIs are not async signal safe. You’ll need to get this information in advance and cache it for your signal handler to record.

This is complicated by the fact that the list of Mach-O images can change as you process loads and unloads code. This requires you to share mutable state with your signal handler, which is exactly what I recommend against in Reading Memory.

Note You can learn about images loading and unloading using _dyld_register_func_for_add_image and_dyld_register_func_for_remove_image respectively.

1 I believe you’ll need to parse the Mach-O load commands to get the image UUID.

like image 43
Najdan Tomić Avatar answered Dec 09 '22 17:12

Najdan Tomić