Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to stop exception alerts from going bezerk

Let's say you have a .NET system that needs to send out email notifications to a system administrator when there's an error. Example:

try
{
    //do something mission critical 
}
catch(Exception ex)
{
    //send ex to the system administrator
    //give the customer a user-friendly explanation
} 

This block of code gets called hundreds of times a second by different users.

Now lets's say an underlying API/service/database goes down. This code is going to fail many, many times. The poor administrator is going to wake up to a few million e-mails in their inbox and the developer is going to get a rude phone call, not that such an incident (cough) necessarily occurred this morning.

It's pretty clear that this is not a design that scales well.

The first few solutions that come to mind are all flawed in some way:

  • Log errors to the database, then expose high error counts through an HTTP Health Check to an external monitoring service such as Pingdom. (My favourite candidate so far. But what if the database goes down?)
  • Have a static cache that keeps track of recent exceptions, and the alert system always checks for duplicates first. (Seems unnecessarily complex, and secondly a lot of error messages differ very slightly - e.g. if there is a time-stamp in the error, it's useless.)
  • Programmatically take our system offline after certain errors or based on constant monitoring of critical dependencies (Risky! What if there's a transient false positive?)
  • Just not alert on those errors, and rely on a different part of the system to monitor and report on the dependencies. (Doesn't cater for the 'unexpected' errors that we haven't anticipated.)

This seems like a problem that has to have been solved, and that we're going about it in a silly way. Suggestions appreciated, even if they involve a completely different exception management strategy!

like image 546
realworldcoder Avatar asked Oct 28 '10 15:10

realworldcoder


People also ask

How do I permanently turn off these annoying Web push notifications?

Disable Notifications in Chrome If you want to turn off these messages completely, select Don't allow sites to send notifications. This will stop all push notifications from your browser, but this will also include productivity-related notifications from apps like Gmail and Google Meet.


2 Answers

the simplest solution that springs to mind is to assign this exception block an ID number (like, 1) and log the time of the last notification to the administrator. If the elapsed time between notifications is not large enough (say, an hour), don't notify the admin again

if this piece of code typically generates more than one kind of exception, you may want to log the class of the exception also; if the elapsed time between notifications for the same exception is not large enough, don't notify the admin again

like image 96
Steven A. Lowe Avatar answered Oct 08 '22 07:10

Steven A. Lowe


Check for similarities (timestamps can be evaded using wildcards (??:?? for example)) and first let them be sent to you for a period of time. Now check which occured the most.

Say, there are 1000 exceptions of type A, 964 of type B, 120 of C and 7 of Types D - H.

That means, send an email to the sysadmin every 100th exception of type A and B, every 10th of Type C and every other excpetion as it occurs.

Pro:
+ Accurate
+ Prevents System-Spam
+ Not much code to implement

Con:
- Needs time to develop a reliable statistic
- Important Exceptions could be ignored accidently
- Relies on humans, which will probably always fail

like image 43
MechMK1 Avatar answered Oct 08 '22 05:10

MechMK1