import numpy as np
import pandas as pd
import matplotlib.pyplot as pt
data1 = pd.read_csv('stage1_labels.csv')
X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_X = LabelEncoder()
X[:,0] = label_X.fit_transform(X[:,0])
encoder = OneHotEncoder(categorical_features = [0])
X = encoder.fit_transform(X).toarray()
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)
#fitting Simple Regression to training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
#predecting the test set results
y_pred = regressor.predict(X_test)
#Visualization of the training set results
pt.scatter(X_train, y_train, color = 'red')
pt.plot(X_train, regressor.predict(X_train), color = 'green')
pt.title('salary vs yearExp (Training set)')
pt.xlabel('years of experience')
pt.ylabel('salary')
pt.show()
I need a help understanding the error in while executing the above code. Below is the error:
"raise ValueError("x and y must be the same size")"
I have .csv file with 1398 rows and 2 column. I have taken 40% as y_test set, as it is visible in the above code.
Therefore the solution to this error is very simple. You have to make sure that the size or dimension of the input values should be the same. Take the above example, you have to pass the x and y variables of the same size. It means if x is of size 4 then y size should be 4.
Print X_train shape. What do you see? I'd bet X_train
is 2d (matrix with a single column), while y_train
1d (vector). In turn you get different sizes.
I think using X_train[:,0]
for plotting (which is from where the error originates) should solve the problem
Slicing with [:, :-1]
will give you a 2-dimensional array (including all rows and all columns excluding the last column).
Slicing with [:, 1]
will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2]
or [:, 1].reshape(-1, 1)
or [:, 1][:, None]
instead of [:, 1]
. This will make x
and y
comparable.
An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0]
(instead of [:, :1]
) for selecting the first column and [:, 1]
for selecting the second column.
Try this:
x_train=np.arange(0,len(x_train),1)
It will make an evenly spaced array
and your error
will be gone permanently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With