Using pySpark ML API in version 2.0.0 for a linear regression simple example, I get an error with new ML library.
The code is:
from pyspark.sql import SQLContext
sqlContext =SQLContext(sc)
from pyspark.mllib.linalg import Vectors
data=sc.parallelize(([1,2],[2,4],[3,6],[4,8]))
def f2Lp(inStr):
return (float(inStr[0]), Vectors.dense(inStr[1]))
Lp = data.map(f2Lp)
testDF=sqlContext.createDataFrame(Lp,["label","features"])
(trainingData, testData) = testDF.randomSplit([0.8,0.2])
from pyspark.ml.regression import LinearRegression
lr=LinearRegression()
model=lr.fit(trainingData)
and error:
IllegalArgumentException: u'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.'
how I should transform vector features from .mllib to .ml type ?
From Spark2.0 use
from pyspark.ml.linalg import Vectors, VectorUDT
instead of
from pyspark.mllib.linalg import Vectors, VectorUDT
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With