I am writing an UDAF to be applied to a Spark data frame column of type Vector (spark.ml.linalg.Vector). I rely on spark.ml.linalg package so that I do not have to go back and forth between dataframe and RDD. Inside the UDAF, I have to specify a data type for the input, buffer, and output schemas: <pre class="prettyprint"><code>def inputSchema = new StructType().add("features", new VectorUDT()) def bufferSchema: StructType = StructType(StructField("list_of_similarities", ArrayType(new VectorUDT(), true), true) :: Nil) override def dataType: DataType = ArrayType(DoubleType,true) </code></pre> VectorUDT is what I would use with spark.mllib.linalg.Vector: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala However, when I try to import it from spark.ml instead: <code>import org.apache.spark.ml.linalg.VectorUDT</code> I get a runtime error (no errors during the build): <pre class="prettyprint"><code>class VectorUDT in package linalg cannot be accessed in package org.apache.spark.ml.linalg </code></pre> Is it expected/can you suggest a workaround? I am using Spark 2.0.0

In Spark 2.0.0, the proper way to go is to use <code>org.apache.spark.ml.linalg.SQLDataTypes.VectorType</code> instead of <code>VectorUDT</code>. It was introduced in this issue.

Issue with VectorUDT when using Spark ML

Tags:

I am writing an UDAF to be applied to a Spark data frame column of type Vector (spark.ml.linalg.Vector). I rely on spark.ml.linalg package so that I do not have to go back and forth between dataframe and RDD.

Inside the UDAF, I have to specify a data type for the input, buffer, and output schemas:

def inputSchema = new StructType().add("features", new VectorUDT())
def bufferSchema: StructType =
    StructType(StructField("list_of_similarities", ArrayType(new VectorUDT(), true), true) :: Nil)

override def dataType: DataType = ArrayType(DoubleType,true)

VectorUDT is what I would use with spark.mllib.linalg.Vector: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala

However, when I try to import it from spark.ml instead: import org.apache.spark.ml.linalg.VectorUDT I get a runtime error (no errors during the build):

class VectorUDT in package linalg cannot be accessed in package org.apache.spark.ml.linalg

Is it expected/can you suggest a workaround?

I am using Spark 2.0.0

283

asked Aug 16 '16 17:08

Alexey Svyatkovskiy

1 Answers

In Spark 2.0.0, the proper way to go is to use org.apache.spark.ml.linalg.SQLDataTypes.VectorType instead of VectorUDT. It was introduced in this issue.

127

answered Nov 01 '22 12:11

nedim

Related questions
                            
                                What algorithm is used by the Scala library method Vector.sorted?
                            
                                How can I use a Java List with Scala's foreach? [duplicate]
                            
                                Comparing items in two lists
                            
                                Equivalent to left outer join in SPARK
                            
                                How do I format time to UTC time zone?
                            
                                Convert HOCON (.conf) to JSON with scala/play?
                            
                                Scala - Convert map key value pair to string
                            
                                In Scala, how do you define a local parameter in the primary constructor of a class?
                            
                                Can a Scala program be compiled to run on any JVM, without having Scala installed on the given machine?
                            
                                In Scala, why does my Sieve algorithm runs so slowly?
                            
                                Scala convert Option to an Int
                            
                                What is the difference between the following ways to write a function in Scala?
                            
                                Akka Actor - wait for some time to expect a message, otherwise send a message out
                            
                                In Scala, can you make an anonymous function have a default argument?
                            
                                Difference between conversion with implicit function and implicit class in Scala
                            
                                Is Scalas/Haskells parser combinators sufficient?
                            
                                What is a full powered closure?
                            
                                Insertion-ordered ListSet
                            
                                How to write this three-liner as a one-liner?
                            
                                Spray-Json: How to parse a Json Array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Issue with VectorUDT when using Spark ML

Tags:

scala

apache-spark

spark-dataframe

apache-spark-ml

Alexey Svyatkovskiy

People also ask

1 Answers

nedim

Recent Activity

Donate For Us