I generate a pandas dataframe from read_sql_query
. It has three columns, "results, speed, weight"
I want to use scikit-learn LinearRegression
to fit results = f(speed, weight)
I haven't been able to find the correct syntax that would allow me to pass this dataframe, or column slices of it, to LinearRegression.fit(y, X)
.
print df['result'].shape
print df[['speed', 'weight']].shape
(8L,)
(8, 2)
but I cannot pass that to fit
lm.fit(df['result'], df[['speed', 'weight']])
It throws a deprecation warning
and a ValueError
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19.
ValueError: Found arrays with inconsistent numbers of samples: [1 8]
What is the efficient, clean way to take dataframes of targets and features, and pass them to fit
operations?
This is how I generated the example:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
data2 = np.random.randint(1, high=100, size=len(days))
data3 = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'result': data,'speed': data2,'weight': data3})
df = df.set_index('test')
print(df)
You are sending values in incorrect order. All scikit-learn estimators implementing fit() accept input X, y not y, X as you are doing.
Try this:
lm.fit(df[['speed', 'weight']], df['result'])
First of all, fit() takes X, y and not y, X.
Second, it's important to remember is that Scikit-Learn exclusively works with array-like objects. It expects that X has shape (n_samples, n_features) and y to have shape (n_samples,)
It will check for these shapes when you use fit, so if your X, y don't abide by these rules, it will crash. Good news, X already has shape (5,2), but y will have shape (5, 1), which is different than (5,) and so your program might crash.
To be safe, I'd simply transform my X and y as numpy arrays from the start.
X = pd.DataFrame(np.ones((5, 2)))
y = pd.DataFrame(np.ones((5,)))
X = np.array(X)
y = np.array(y).squeeze()
For y to go from shape (5,1) to shape (5,), you need to use .squeeze()
This will give you the right shapes and hopefully the program will run!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With