I get the error
expected zero arguments for construction of ClassDict (for pyspark.mllib.linalg.DenseVector)
by trying this:
I have a function which I convert to a udf for transforming values of a column from a dataframe. Like this:
def func(vector):
#does something
return Vector.dense(vector)
udfunc = udf(func, ArrayType(FloatType()))
new_df = df.withColumn("vector",func(df.vector))
new_df.show()
The column df.vector has denseVector values.
Has anybody an idea to fix this proplem or a hint ?
Thanks in Advance
Given the part of the you provided the obvious issue is that you declare incorrect return type. Catalyst type of Vector
is VectorUDT
not ArrayType(FloatType())
from pyspark.mllib.linalg import Vectors, VectorUDT
from pyspark.sql.types import ArrayType, FloatType
from pyspark.sql.functions import udf
dummy_udf = udf(lambda _: Vectors.dense([0, 0, 0]), VectorUDT())
sc.parallelize([(Vectors.dense([1, 1, 1]), )]).toDF(["x"]).select(dummy_udf("x"))
In Spark 2.0 and later use pyspark.ml.linalg
to achieve compatibility with pyspark.ml
API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With