Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anomaly Detection vs Supervised Learning

I have very small data that belongs to positive class and a large set of data from negative class. According to prof. Andrew Ng (anomaly detection vs supervised learning), I should use Anomaly detection instead of Supervised learning because of highly skewed data.

Please correct me if I am wrong but both techniques look same to me i.e. in both (supervised) Anomaly detection, and standard Supervised learning, we train data with both normal and anomalous samples and test on unknown data. Is there any difference?

Should I just perform under-sampling of negative class or over-sampling of positive class to get both type data of same size? Does it affect the overall accuracy?

like image 562
Junaid Avatar asked Apr 25 '14 16:04

Junaid


1 Answers

Actually in supervised learning, you have the data set labelled (e.g good, bad) and you pass the labelled values as you train the model so that it learns parameters that will separate the 'good' from 'bad' results.

In anomaly detection, it is unsupervised as you do not pass any labelled values.. What you do is you train using only the 'non-anomalous' data. You then select epsilon values and evaluate with a numerical value (such as F1 score) so that your model will get a good balance of true positives.

Regarding trying to over/under sample so your data is not skewed, there are 2 things.

  1. Prof Ng mentioned something like if your positive class is only 10 out of 10k or 100k then you need to use anomaly detection since your data is highly skewed.
  2. Supervised learning makes sense if you know typically what 'bad' values are. If you only know what is 'normal'/'good' but your 'bad' value can really be very different every time then this is a good case for anomaly detection.
like image 119
Saiful Rizal MDRamli Avatar answered Sep 20 '22 20:09

Saiful Rizal MDRamli