I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.
This is the dataframe I have:
data_df = date value 2016-01-15 1555 2016-01-16 1678 2016-01-17 1789 ... y = np.asarray(data_df['value']) X = data_df[['date']] X_train, X_test, y_train, y_test = train_test_split (X,y,train_size=.7,random_state=42) model = LinearRegression() #create linear regression object model.fit(X_train, y_train) #train model on train data model.score(X_train, y_train) #check score print (‘Coefficient: \n’, model.coef_) print (‘Intercept: \n’, model.intercept_) coefs = zip(model.coef_, X.columns) model.__dict__ print "sl = %.1f + " % model.intercept_ + \ " + ".join("%.1f %s" % coef for coef in coefs) #linear model
I tried to convert the date unsuccessfully
data_df['conv_date'] = data_df.date.apply(lambda x: x.toordinal()) data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:
import datetime as dt data_df['Date'] = pd.to_datetime(data_df['Date']) data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With