What are good algorithms for detecting abnormality?

Background

Here is the problem:

A black box outputs a new number each day.
Those numbers have been recorded for a period of time.
Detect when a new number from the black box falls outside the pattern of numbers established over the time period.

The numbers are integers, and the time period is a year.

Question

What algorithm will identify a pattern in the numbers?

The pattern might be simple, like always ascending or always descending, or the numbers might fall within a narrow range, and so forth.

Ideas

I have some ideas, but am uncertain as to the best approach, or what solutions already exist:

Machine learning algorithms?
Neural network?
Classify normal and abnormal numbers?
Statistical analysis?

791

asked Sep 22 '10 15:09

Joseph Garvin

2 Answers

Cluster your data.

If you don't know how many modes your data will have, use something like a Gaussian Mixture Model (GMM) along with a scoring function (e.g., Bayesian Information Criterion (BIC)) so you can automatically detect the likely number of clusters in your data. I recommend this instead of k-means if you have no idea what value k is likely to be. Once you've constructed a GMM for you data for the past year, given a new datapoint x, you can calculate the probability that it was generated by any one of the clusters (modeled by a Gaussian in the GMM). If your new data point has low probability of being generated by any one of your clusters, it is very likely a true outlier.

If this sounds a little too involved, you will be happy to know that the entire GMM + BIC procedure for automatic cluster identification has been implemented for you in the excellent MCLUST package for R. I have used it several times to great success for such problems.

Not only will it allow you to identify outliers, you will have the ability to put a p-value on a point being an outlier if you need this capability (or want it) at some point.

answered Oct 13 '22 16:10

awesomo

You could try line fitting prediction using linear regression and see how it goes, it would be fairly easy to implement in your language of choice. After you fitted a line to your data, you could calculate the mean standard deviation along the line. If the novel point is on the trend line +- the standard deviation, it should not be regarded as an abnormality.

PCA is an other technique that comes to mind, when dealing with this type of data.

You could also look in to unsuperviced learning. This is a machine learning technique that can be used to detect differences in larger data sets.

Sounds like a fun problem! Good luck

answered Oct 13 '22 18:10

Theodor

Related questions
                            
                                maximum bipartite matching (ford-fulkerson)
                            
                                How to sort a collection of points so that they set up one after another?
                            
                                Average face - algorithm
                            
                                Find all subarrays of fixed length with a given ranking
                            
                                How to build a Plinko board of words from a dictionary better than brute force?
                            
                                For a given binary tree find maximum binary search sub-tree
                            
                                Is there an algorithm to generate all unique circular permutations of a multiset?
                            
                                Minimize the sum of errors of representative integers
                            
                                Find words and combinations of words that can be spoken the quickest
                            
                                Square Subsequence
                            
                                Concurrent mutable priority queue
                            
                                Four-way navigation algorithm
                            
                                number to unique permutation mapping of a sequence containing duplicates
                            
                                Shifting/aligning/rotating a circular buffer to zero in-place
                            
                                Draw Quadratic Curve on GPU
                            
                                maximal intersection between n sets
                            
                                Algorithm for itertools.combinations in Python
                            
                                Fastest way to calculate primes in C#?
                            
                                What is the most efficient way to determine if a directed graph is singly connected?
                            
                                Reverse regular expressions to generate data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are good algorithms for detecting abnormality?

Tags:

algorithm

machine-learning

prediction

black-box