Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark expected zero arguments for construction of ClassDict (for pyspark.mllib.linalg.DenseVector)

I get the error

expected zero arguments for construction of ClassDict (for pyspark.mllib.linalg.DenseVector)

by trying this:

I have a function which I convert to a udf for transforming values of a column from a dataframe. Like this:

def func(vector):
   #does something

   return Vector.dense(vector)

udfunc = udf(func, ArrayType(FloatType()))

new_df = df.withColumn("vector",func(df.vector))
new_df.show()

The column df.vector has denseVector values.

Has anybody an idea to fix this proplem or a hint ?

Thanks in Advance

like image 911
sedioben Avatar asked Jul 07 '16 15:07

sedioben


1 Answers

Given the part of the you provided the obvious issue is that you declare incorrect return type. Catalyst type of Vector is VectorUDT not ArrayType(FloatType())

from pyspark.mllib.linalg import Vectors, VectorUDT
from pyspark.sql.types import ArrayType, FloatType
from pyspark.sql.functions import udf

dummy_udf = udf(lambda _: Vectors.dense([0, 0, 0]), VectorUDT())

sc.parallelize([(Vectors.dense([1, 1, 1]), )]).toDF(["x"]).select(dummy_udf("x"))

In Spark 2.0 and later use pyspark.ml.linalg to achieve compatibility with pyspark.ml API.

like image 114
zero323 Avatar answered Nov 15 '22 13:11

zero323