Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understand LocalOutlinerFactor algorithm by example

So I have worked through the sklearn example of LocalOutliner Detection and tried to apply it on a example dataset I have. But somehow the result itself does not really make sense to me.

What I have implemented looks like: (excluded the import stuff)

import numpy as np
import matplotlib.pyplot as plt
import pandas
from sklearn.neighbors import LocalOutlierFactor


# import file
url = ".../Python/outliner.csv"
names = ['R1', 'P1', 'T1', 'P2', 'Flag']
dataset = pandas.read_csv(url, names=names)    

array = dataset.values
X = array[:,0:2] 
rng = np.random.RandomState(42)


# fit the model
clf = LocalOutlierFactor(n_neighbors=50, algorithm='auto', leaf_size=30)
y_pred = clf.fit_predict(X)
y_pred_outliers = y_pred[500:]

# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(0, 1000, 50), np.linspace(0, 200, 50))
Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("Local Outlier Factor (LOF)")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

a = plt.scatter(X[:200, 0], X[:200, 1], c='white',
                edgecolor='k', s=20)
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
                edgecolor='k', s=20)
plt.axis('tight')
plt.xlim((0, 1000))
plt.ylim((0, 200))
plt.legend([a, b],
           ["normal observations",
            "abnormal observations"],
           loc="upper left")
plt.show()

I get something like this: LOF Outliner Detection

Can anybody tell me why the detection fails?

I have tried to play with the parameters and ranges but not much changes to the outliner detection itself.

Would be great if somebody can point me into the right direction with the issue. Thanks

Edit: Added the import: File

like image 502
Grisuu Avatar asked Mar 09 '26 04:03

Grisuu


1 Answers

I assume you followed this example. That example tries to compare actual/observations data (scatter plot) vs decision function learned from them (contour plot). Since the data is known/made up (200 normal + 20 outliers), we can simply select the outliers by using X[200:] (index 200th onwards) and select the normal using X[:200] (index 0-199th).

So if you want to plot the prediction result (as scatter plot) instead of the actual/observation data, you would want to do it like code below. Basically you split the X based on the y_pred (1: normal, -1: outlier) then use it in scatter plot:

import numpy as np
import matplotlib.pyplot as plt
import pandas
from sklearn.neighbors import LocalOutlierFactor

# import file
url = ".../Python/outliner.csv"
names = ['R1', 'P1', 'T1', 'P2', 'Flag']
dataset = pandas.read_csv(url, names=names)
X = dataset.values[:, 0:2]

# fit the model
clf = LocalOutlierFactor(n_neighbors=50, algorithm='auto', leaf_size=30)
y_pred = clf.fit_predict(X)

# map results
X_normals = X[y_pred == 1]
X_outliers = X[y_pred == -1]

# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(0, 1000, 50), np.linspace(0, 200, 50))
Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("Local Outlier Factor (LOF)")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

a = plt.scatter(X_normals[:, 0], X_normals[:, 1], c='white', edgecolor='k', s=20)
b = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red', edgecolor='k', s=20)
plt.axis('tight')
plt.xlim((0, 1000))
plt.ylim((0, 200))
plt.legend([a, b], ["normal predictions", "abnormal predictions"], loc="upper left")
plt.show()

As you can see the scatter plot of normal data will follow the contour plot:

enter image description here

like image 154
Yohanes Gultom Avatar answered Mar 10 '26 19:03

Yohanes Gultom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!