Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make predictions with Linear Regression Model?

I am currently working on a linear regression project where I need to gather data, fit it on a model, and then make a prediction based on test data.

If I'm correct, simple linear regression works with two variables, X (independent) and Y (dependent). I have the following Dataset, where I consider the time column to be X and the value column to be Y:

+-----+------+
|value|minute|
+-----+------+
| 5000|   672|
| 6000|   673|
| 7000|   676|
| 8000|   678|
| 9000|   680|
+-----+------+

What I don't know is how to fit this Dataset correctly into a Linear Regression Model. I've worked with k-means before and what I did with it was create a features column in vector form. I did the same with this dataset:

VectorAssembler assembler = new VectorAssembler()
                .setInputCols(new String[]{"minute", "value"})
                .setOutputCol("features");

Dataset<Row> vectorData = assembler.transform(dataset);

I then fit this into a linear regression model:

LinearRegression lr = new LinearRegression();
LinearRegressionModel model = lr.fit(vectorData);

This is the part where I get stuck. How can I make predictions with this model? I want to find the value of value when minute is equal to a random minute, eg. 700.

How can I do that? How can I find a prediction/estimate of my Y value based on a random X value?

EDIT: Does the linear regression model differentiates between dependent and independent variable? How?

like image 484
Guillermo Herrera Avatar asked Nov 26 '25 23:11

Guillermo Herrera


1 Answers

I've only started with Spark MLlib and especially linear regression so I can only discuss technicalities (not why things work this way in machine learning).

This is the part where I get stuck. How can I make predictions with this model?

Models are transformers (like VectorAssembler) that offers a very simple interface with transform operator.

transform(dataset: Dataset[_]): DataFrame Transforms the input dataset.

That's where you pass the dataset and get another dataset with prediction column back. That's by the way the general approach to train and make predictions.

The following will give you the predictions out of the features in the input dataset.

val dataset = ...
model.transform(dataset).select("prediction").show

I'd strongly recommend using Spark MLlib's ML Pipeline feature for the so-called predictive analytics workflow that makes the process of transforming a raw data to the format of an Estimator so much more pleasant. See Machine Learning Library (MLlib) Guide and especially ML Pipelines.

ML Pipelines provide a uniform set of high-level APIs built on top of DataFrames that help users create and tune practical machine learning pipelines.

like image 106
Jacek Laskowski Avatar answered Nov 29 '25 20:11

Jacek Laskowski



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!