Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fisher's linear discriminant in Python

I have the fisher's linear discriminant that i need to use it to reduce my examples A and B that are high dimensional matrices to simply 2D, that is exactly like LDA, each example has classes A and B, therefore if i was to have a third example they also have classes A and B, fourth, fifth and n examples would always have classes A and B, therefore i would like to separate them in a simple use of fisher's linear discriminant. Im pretty much new to machine learning, so i dont know how to separate my classes, i've been following the formula by eye and coding on the go. From what i was reading, i need to apply a linear transformation to my data so i can find a good threshold for it, but first i'd need to find the maximization function. For such task, i managed to find Sw and Sb, but i don't know how to go from there...

enter image description here

Where i also need to find the maximization function.

enter image description here

That maximization function gives me an eigen value solution:

enter image description here

What i have for each classes are matrices 5x2 of 2 examples. For instance:

Example 1 
Class_A = [
201, 103,
40, 43,
23, 50,
12, 123,
99, 78
]
Class_B = [
   201, 129,
   114, 195,
   180, 90,
   69, 62,
   76, 90
]

Example 2   
Class_A = [
68, 98,
201, 203,
78, 212,
49, 5,
204, 78
]   
Class_B = [
   52, 19,
   220, 219,
   159, 195,
   99, 23,
   46, 50
]

I tried finding Sw for the example above like this:

Example_1_Class_A = np.dot(Example_1_Class_A,  np.transpose(Example_1_Class_A))
Example_1_Class_B = np.dot(Example_1_Class_B,  np.transpose(Example_1_Class_B))

Example_2_Class_A = np.dot(Example_2_Class_A,  np.transpose(Example_2_Class_A))
Example_2_Class_B = np.dot(Example_2_Class_B,  np.transpose(Example_2_Class_B))

Sw = sum([Example_1_Class_A, Example_1_Class_B, Example_2_Class_A, Example_2_Class_B], axis=0)

As for Sb, i tried like this:

Example_1_Class_A_mean = Example_1_Class_A.mean(axis=0)
Example_1_Class_B_mean = Example_1_Class_B.mean(axis=0)
         
Example_2_Class_A_mean = Example_2_Class_A.mean(axis=0)
Example_2_Class_B_mean = Example_2_Class_B.mean(axis=0)
         
Example_1_Class_A_Sb = np.dot(Example_1_Class_A_mean, np.transpose(Example_1_Class_A_mean))
Example_1_Class_B_Sb = np.dot(Example_1_Class_B_mean, np.transpose(Example_1_Class_B_mean))
         
Example_2_Class_A_Sb = np.dot(Example_2_Class_A_mean, np.transpose(Example_2_Class_A_mean))
Example_2_Class_B_Sb = np.dot(Example_2_Class_B_mean, np.transpose(Example_2_Class_B_mean))
Sb = sum([Example_1_Class_A_Sb, Example_1_Class_B_Sb, Example_2_Class_A_Sb, Example_2_Class_B_Sb], axis=0) 

The problem is, i have no idea what else to do with my Sw and Sb, i am completely lost. Basically, what i need to do is get from here to this:

enter image description here

How for given Example A and Example B, do i separate a cluster only for classes As and only for classes b

like image 491
xnok Avatar asked Jun 27 '20 13:06

xnok


People also ask

What is the fisher linear discriminant method?

Fisher's linear discriminant can be used as a supervised learning classifier. Given labeled data, the classifier can find a set of weights to draw a decision boundary, classifying the data. Fisher's linear discriminant attempts to find the vector that maximizes the separation between classes of the projected data.

How do I use LDA in Python?

Compute the eigenvectors and corresponding eigenvalues for the scatter matrices. Sort the eigenvalues and select the top k. Create a new matrix containing eigenvectors that map to the k eigenvalues. Obtain the new features (i.e. LDA components) by taking the dot product of the data and the matrix from step 4.

What is linear discriminant analysis Python?

Linear Discriminant Analysis, or LDA for short, is a classification machine learning algorithm. It works by calculating summary statistics for the input features by class label, such as the mean and standard deviation. These statistics represent the model learned from the training data.

Which is better LDA or Qda?

In general, LDA tends to be better than QDA if there are relatively few training observations, so therefore reducing variance is crucial. QDA is recommended if the training set is very large, so that the variance of the classifier is not a major concern.


2 Answers

Before answering your question, I will first touch the basic difference between PCA and (F)LDA. In PCA you don't know anything about underlying classes, but you assume that the information about classes separability lies in the variance of data. So you rotate your original axes (sometimes it is called projecting all the data onto new ones) in such way that your first new axis is pointing to the direction of most variance, second one is perpendicular to the first one and pointing to the direction of most residiual variance, and so on. This way a PCA transformation results in a (sub)space of the same dimensionality as the original one. Than you can take only first 2 dimensions, rejecting the rest, hence getting a dimensionality reduction from k dimensions to only 2.

LDA works a bit differently. In this case you know in advance how many classes there are in your data, and you can find their mean and covariance matrices. What Fisher criterion does it finds a direction in which the mean between classes is maximized, while at the same time total variability is minimized (total variability is a mean of within-class covariance matrices). And for each two classes there is only one such line. This is why when your data has C classes, LDA can provide you at most C-1 dimensions, regardless of the original data dimensionality. In your case this means that as you have only 2 classes A and B, you will get a one-dimensional projection, i.e. a line. And this is exactly what you have in your picture: original 2d data is projected on to a line. The direction of the line is the solution of the eigenproblem. Let's generate data that is similar to your picture:

a = np.random.multivariate_normal((1.5, 3), [[0.5, 0], [0, .05]], 30)
b = np.random.multivariate_normal((4, 1.5), [[0.5, 0], [0, .05]], 30)
plt.plot(a[:,0], a[:,1], 'b.', b[:,0], b[:,1], 'r.')
mu_a, mu_b = a.mean(axis=0).reshape(-1,1), b.mean(axis=0).reshape(-1,1)
Sw = np.cov(a.T) + np.cov(b.T)
inv_S = np.linalg.inv(Sw)
res = inv_S.dot(mu_a-mu_b)  # the trick
####
# more general solution
#
# Sb = (mu_a-mu_b)*((mu_a-mu_b).T)
# eig_vals, eig_vecs = np.linalg.eig(inv_S.dot(Sb))
# res = sorted(zip(eig_vals, eig_vecs), reverse=True)[0][1] # take only eigenvec corresponding to largest (and the only one) eigenvalue
# res = res / np.linalg.norm(res)

plt.plot([-res[0], res[0]], [-res[1], res[1]]) # this is the solution
plt.plot(mu_a[0], mu_a[1], 'cx')
plt.plot(mu_b[0], mu_b[1], 'yx')
plt.gca().axis('square')

# let's project data point on it
r = res.reshape(2,)
n2 = np.linalg.norm(r)**2
for pt in a:
    prj = r * r.dot(pt) / n2
    plt.plot([prj[0], pt[0]], [prj[1], pt[1]], 'b.:', alpha=0.2)
for pt in b:
    prj = r * r.dot(pt) / n2
    plt.plot([prj[0], pt[0]], [prj[1], pt[1]], 'r.:', alpha=0.2)

The resulting projection is calculated using a neat trick for two class problem. You can read details on it here in section 1.6.

enter image description here

Regarding the "examples" you mention in your question. I believe you need to repeat the process for each example, as it is a different set of data point probably with different distributions. Also, put attention that estimated mean (mu_a, mu_b) and class covariance matrices would be slightly different from the ones that data was generated with, especially for small sample size.

like image 121
igrinis Avatar answered Oct 23 '22 12:10

igrinis


Mathematics

enter image description here

See https://sebastianraschka.com/Articles/2014_python_lda.html#lda-in-5-steps for more information.

Implementation using Iris

Since you want to use LDA for dimensionality reduction but provide only 2d data I am showing how to perform this procedure on the iris dataset.

Let's import libraries

    import pandas as pd
    import numpy as np
    import sklearn as sk
    from collections import Counter
    from sklearn import datasets
    
    # load dataset and transform to pandas df
    X, y = datasets.load_iris(return_X_y=True)
    X = pd.DataFrame(X, columns=[f'feat_{i}' for i in range(4)])
    y = pd.DataFrame(y, columns=['labels'])
    tot = pd.concat([X,y], axis=1)
    # calculate class means
    class_means = tot.groupby('labels').mean()
    total_mean = X.mean()

The class_means are given by:

class_means

    feat_0  feat_1  feat_2  feat_3
labels              
0   5.006   3.428   1.462   0.246
1   5.936   2.770   4.260   1.326
2   6.588   2.974   5.552   2.026

![enter image description here

To do this, we first subtract the class means from each observation (basically we calculate x - m_i from the equation above). Subtract the corresponding class mean from each observation. Since we want to calculate

x_mi = tot.transform(lambda x: x - class_means.loc[x['labels']], axis=1).drop('labels', 1)

def kronecker_and_sum(df, weights):
    S = np.zeros((df.shape[1], df.shape[1]))
    for idx, row in df.iterrows():
        x_m = row.as_matrix().reshape(df.shape[1],1)
        S += weights[idx]*np.dot(x_m, x_m.T)
    return S

# Each x_mi is weighted with 1. Now we use the kronecker_and_sum function to calculate the within-class scatter matrix S_w
S_w = kronecker_and_sum(x_mi, 150*[1])

adfadfs

mi_m = class_means.transform(lambda x: x - total_mean, axis=1)
# Each mi_m is weighted with the number of observations per class which is 50 for each class in this example. We use kronecker_and_sum to calculate the between-class scatter matrix.

S_b=kronecker_and_sum(mi_m, 3*[50])

enter image description here

eig_vals, eig_vecs = np.linalg.eig(np.linalg.inv(S_w).dot(S_b))

We only need to consider the eigenvalues which are remarkably different from zero (in this case only the first two)

eig_vals
array([ 3.21919292e+01,  2.85391043e-01,  6.53468167e-15, -2.24877550e-15])

Transform X with the matrix of the two eigenvectors which correspond to the highest eigenvalues

    W = eig_vecs[:, :2]
    X_trafo = np.dot(X, W)
    tot_trafo = pd.concat([pd.DataFrame(X_trafo, index=range(len(X_trafo))), y], 1)
    # plot the result
    tot_trafo.plot.scatter(x=0, y=1, c='labels', colormap='viridis')

enter image description here We have reduced the dimensions from 4 to 2 and chosen the space in such a way, that the classes can be well seperated.

Scikit-learn usage

Scikit has LDA support aswell. What we did in dozens of lines can be done with the following lines of code:

from sklearn import discriminant_analysis
lda = discriminant_analysis.LinearDiscriminantAnalysis(n_components=2)
X_trafo_sk = lda.fit_transform(X,y)
pd.DataFrame(np.hstack((X_trafo_sk, y))).plot.scatter(x=0, y=1, c=2, colormap='viridis')

I'm not giving a plot here, cause it is the same as in our derived example (except for a 180 degree rotation).

like image 6
pythonic833 Avatar answered Oct 23 '22 11:10

pythonic833