Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

Tags:

scikit-learn

Just trying to do a simple linear regression but I'm baffled by this error for:

regr = LinearRegression()
regr.fit(df2.iloc[1:1000, 5].values, df2.iloc[1:1000, 2].values)

which produces:

ValueError: Found arrays with inconsistent numbers of samples: [  1 999]

These selections must have the same dimensions, and they should be numpy arrays, so what am I missing?

like image 307
sunny Avatar asked Jun 12 '15 22:06

sunny


2 Answers

It looks like sklearn requires the data shape of (row number, column number). If your data shape is (row number, ) like (999, ), it does not work. By using numpy.reshape(), you should change the shape of the array to (999, 1), e.g. using

data=data.reshape((999,1))

In my case, it worked with that.

like image 93
Yul Avatar answered Nov 09 '22 16:11

Yul


Looks like you are using pandas dataframe (from the name df2).

You could also do the following:

regr = LinearRegression()
regr.fit(df2.iloc[1:1000, 5].to_frame(), df2.iloc[1:1000, 2].to_frame())

NOTE: I have removed "values" as that converts the pandas Series to numpy.ndarray and numpy.ndarray does not have attribute to_frame().

like image 25
user24981 Avatar answered Nov 09 '22 14:11

user24981