Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Field "features" does not exist. SparkML

I am trying to build a model in Spark ML with Zeppelin. I am new to this area and would like some help. I think i need to set the correct datatypes to the column and set the first column as the label. Any help would be greatly appreciated, thank you

val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true  
val df = training.toDF

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

 val lrModel = lr.fit(df)

// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: \n${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")

A snippet of the csv file i am using is:

IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,
like image 741
Young4844 Avatar asked Jul 06 '17 13:07

Young4844


1 Answers

As you have mentioned, you are missing the features column. It is a vector containing all predictor variables. You have to create it using VectorAssembler.

IsAlert is the label and all others variables (p1,p2,...) are predictor variables, you can create features column (actually you can name it anything you want instead of features) by:

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

//creating features column
val assembler = new VectorAssembler()
  .setInputCols(Array("P1","P2","P3","P4","P5","P6","P7","P8","E1","E2"))
  .setOutputCol("features")


val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)
  .setFeaturesCol("features")   // setting features column
  .setLabelCol("IsAlert")       // setting label column

//creating pipeline
val pipeline = new Pipeline().setStages(Array(assembler,lr))

//fitting the model
val lrModel = pipeline.fit(df)

Refer: https://spark.apache.org/docs/latest/ml-features.html#vectorassembler.

like image 63
vdep Avatar answered Oct 19 '22 02:10

vdep