Field "features" does not exist. SparkML

Question

I am trying to build a model in Spark ML with Zeppelin. I am new to this area and would like some help. I think i need to set the correct datatypes to the column and set the first column as the label. Any help would be greatly appreciated, thank you

val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true  
val df = training.toDF

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

 val lrModel = lr.fit(df)

// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: 
${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")

A snippet of the csv file i am using is:

IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,

vdep · Accepted Answer

As you have mentioned, you are missing the features column. It is a vector containing all predictor variables. You have to create it using VectorAssembler.

IsAlert is the label and all others variables (p1,p2,...) are predictor variables, you can create features column (actually you can name it anything you want instead of features) by:

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

//creating features column
val assembler = new VectorAssembler()
  .setInputCols(Array("P1","P2","P3","P4","P5","P6","P7","P8","E1","E2"))
  .setOutputCol("features")


val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)
  .setFeaturesCol("features")   // setting features column
  .setLabelCol("IsAlert")       // setting label column

//creating pipeline
val pipeline = new Pipeline().setStages(Array(assembler,lr))

//fitting the model
val lrModel = pipeline.fit(df)

Refer: https://spark.apache.org/docs/latest/ml-features.html#vectorassembler.

Field "features" does not exist. SparkML

Tags:

scala

apache-spark-ml

apache-zeppelin

Young4844

1 Answers

vdep

Recent Activity

Donate For Us

Field "features" does not exist. SparkML

Tags:

scala

apache-spark-ml

apache-zeppelin

Young4844

1 Answers

vdep

Related questions

Recent Activity

Donate For Us