I'm a newbie to Machine Learning and trying to work through an error I'm getting using OneHotEncoder class. The error is: "Expected 2D array, got 1D array instead". So when I think of 1D arrays it's something like: [1,4,5,6]
and a 2D array would be [[2,3], [3,4], [5,6]]
, but I still cannot figure out why this is failing. It's failing on this line:
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
Here is my whole code:
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)
# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
I'm sure you can tell by this code that I have 2 columns that were labels. I used the Label Encoder to turn those columns into numbers. I'd like to use OneHotEncoder to take it one step further and turn these into a matrix so each row would have something like this:
0 1 0
1 0 1
The only thing that came to mind was how I encoded the labels. I did them one by one instead of doing them all at once. Not sure this is the problem.
I was hoping to do something like this:
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
# Transform into a Matrix
onehotencoder2 = OneHotEncoder(categorical_features = [1])
X[:, 1] = onehotencoder1.fit_transform(X[:, 1]).toarray()
Below you will find my whole error:
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 2. 1. 3. 2. 3. 5. 5. 0. 4. 0.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Any help in the right direction would be great.
I got the same error and after the error message there's a suggestion as followed:
"Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
Since my data was an array, i used X.values.reshape(-1,1)
and it works. (There was another suggestion to use X.values.reshape
instead of X.reshape
).
This is an issue in sklearn OneHotEncoder raised in https://github.com/scikit-learn/scikit-learn/issues/3662. Most scikit learn estimators need a 2D array rather than a 1D array.
The standard practice is to include a multidimensional array. Since you have specified which column to consider as categorical for onehotencoding in categorical_features = [0]
, you can rewrite the next line as the following to take whole dataset or a part of it. It will consider only the first column for categorical to dummy transformation while still have a multidimensional array to work with.
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()
(I hope your dataset doesn't have anymore categorical values. I'll advise you to labelencode everything first, then onehotencode.
I came across a fix by adding
X=X.reshape(-1,1)
the error appears to be gone now, but not sure if this is the right way to fix this
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With