Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rare Event Detection

Is there any good reference to Algorithms that people use for rare event detection ? Also, How is the time factor taken into account ? If i have a case where successive data points tell something (t_1 to t_n) , How can one factor this into normal Machine learning scenario ?

Any pointer will be appreciated.

like image 991
AlgoMan Avatar asked Jun 10 '10 19:06

AlgoMan


1 Answers

It may help to describe your scenario more. Since you are trying to find rare events I assume that you have a working definition of not rare (For some problem spaces this is really hard).

For instance lets say that we have some process that is not a random walk process such as CPU utilization for some service. If you wanted to detect rare events you could take the mean utilization and then look several standard deviations out. Techniques from Statistical Process Control are useful here.

If we have a random walk process such as stock prices (can of worms opened...please just assume this for the sake of simplicity). The directional movement from t to t+1 is random. A random event might be a certain number of consecutive moves in a single direction or a large move in a single direction at a single time step. See Stochastic Calculus for the underlying concepts.

If a process at step t is dependent only on step t-1 then we can use Markov Chains to model the process.

This is a short list of mathematical techniques available to you. Now on to machine learning. Why do you want to use machine learning? (Always good to think about to make sure you are not over complicating the problem) Lets assume that you do and it is the right solution. The actual algorithm that you use is not very important at this stage. What you need to do is define what a rare event is. Conversely you can define what a normal event is and look for things that are not normal. Note that these are not the same thing. Say we produce a set of rare events r1...rn. Each of those rare events will have some features associated with it. For instance if a computer failed there might be features like the last time it was seen on a network, its switch port status, etc... This is actually the most important part of machine learning, training set construction. It usually consists of hand labeling a set of examples to train the model on. Once you have a better understanding of the feature space you may be able to train another model to label for you. Repeat this process until you are satisfied.

Now if you are able to define your rare event set it may be cheaper to simply generate heuristics. For detecting rare events I have always found this to work better.

like image 66
Steve Severance Avatar answered Oct 16 '22 02:10

Steve Severance