Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: x and y must be the same size

import numpy as np
import pandas as pd
import matplotlib.pyplot as pt

data1 = pd.read_csv('stage1_labels.csv')

X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_X = LabelEncoder()
X[:,0] = label_X.fit_transform(X[:,0])
encoder = OneHotEncoder(categorical_features = [0])
X = encoder.fit_transform(X).toarray()

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)

#fitting Simple Regression to training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#predecting the test set results
y_pred = regressor.predict(X_test)

#Visualization of the training set results
pt.scatter(X_train, y_train, color = 'red')
pt.plot(X_train, regressor.predict(X_train), color = 'green')
pt.title('salary vs yearExp (Training set)')
pt.xlabel('years of experience')
pt.ylabel('salary')
pt.show()

I need a help understanding the error in while executing the above code. Below is the error:

"raise ValueError("x and y must be the same size")"

I have .csv file with 1398 rows and 2 column. I have taken 40% as y_test set, as it is visible in the above code.

like image 588
user3521180 Avatar asked Jan 15 '17 09:01

user3521180


People also ask

How do you solve the Valueerror X and Y must be the same size?

Therefore the solution to this error is very simple. You have to make sure that the size or dimension of the input values should be the same. Take the above example, you have to pass the x and y variables of the same size. It means if x is of size 4 then y size should be 4.


3 Answers

Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

like image 111
Lukasz Tracewski Avatar answered Nov 19 '22 11:11

Lukasz Tracewski


Slicing with [:, :-1] will give you a 2-dimensional array (including all rows and all columns excluding the last column).

Slicing with [:, 1] will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2] or [:, 1].reshape(-1, 1) or [:, 1][:, None] instead of [:, 1]. This will make x and y comparable.


An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0] (instead of [:, :1]) for selecting the first column and [:, 1] for selecting the second column.

like image 40
yogabonito Avatar answered Nov 19 '22 09:11

yogabonito


Try this:

x_train=np.arange(0,len(x_train),1)

It will make an evenly spaced array and your error will be gone permanently.

like image 3
Ritik Kesharwani Avatar answered Nov 19 '22 09:11

Ritik Kesharwani