I am using Python 3.6.1 | Anaconda 4.4.0
I am novice in ML and practicing while learning. I picked up a kagle dataset to practice LDA for dimensionality reduction. Two confusion arised:
code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
datasets = pd.read_csv('mushrooms.csv')
X_df = datasets.iloc[:, 1:] # Independent variables
y_df = datasets.iloc[:, 0] # Dependent variables
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
X_df = X_df.apply(LabelEncoder().fit_transform)
x = OneHotEncoder(sparse=False).fit_transform(X_df.values)
y = LabelEncoder().fit_transform(y_df.values)
# Splitting dataset in to training set and test set.
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test =
train_test_split(x,y,test_size=0.2,random_state=0)
# Feature scaling
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
#---------------------------------------------
# Applying LDA (Linear Discriminant Analysis)
#---------------------------------------------
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
x_train = lda.fit_transform(x_train, y_train)
x_test = lda.transform(x_test)
This suggests just what the error message says: some of your variables are collinear. In other words, the elements of one vector are a linear function of the elements of another, such as
0, 1, 2, 3
3, 5, 7, 9
In this case, LDA can't differentiate their influences on the rest of the world.
I can't diagnose anything specific, since you failed to provide the suggested MCVE.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With