Why can't I display prediction column of Spark MultilayerPerceptronClassifier?

Tags:

I am using Spark's MultilayerPerceptronClassifier. This generates a column 'predicted' in 'predictions'. When I try to show it I get the error:

SparkException: Failed to execute user defined function($anonfun$1: (vector) => double) ...
Caused by: java.lang.IllegalArgumentException: requirement failed: A & B Dimension mismatch!

Other columns, for example, vector display OK. Part of predictions schema:

|-- vector: vector (nullable = true)
|-- prediction: double (nullable = true)

My code is:

//racist is boolean, needs to be string:
val train2 = train.withColumn("racist", 'racist.cast("String"))
val test2 = test.withColumn("racist", 'racist.cast("String"))

val indexer = new StringIndexer().setInputCol("racist").setOutputCol("indexracist")

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector") //.setVectorSize(3).setMinCount(0)

val layers = Array[Int](4,5, 2)

val mpc = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100).setFeaturesCol("vector").setLabelCol("indexracist")

val pipeline = new Pipeline().setStages(Array(indexer, word2Vec, mpc))

val model = pipeline.fit(train2)

val predictions = model.transform(test2)

predictions.select("prediction").show()

EDIT the proposed similar question's problem was

val layers = Array[Int](0, 0, 0, 0)

which is not the case here, nor is it the same error.

EDIT AGAIN: part0 of train and test are saved in PARQUET format here.

644

asked Jul 20 '17 09:07

schoon

1 Answers

The addition of .setVectorSize(3).setMinCount(0) and changing val layers = Array[Int](3,5, 2) made it work:

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector").setVectorSize(3).setMinCount(0)

// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4
// and output of size 3 (classes)
val layers = Array[Int](3,5, 2)

118

answered Sep 19 '22 02:09

schoon

Related questions
                            
                                Scala - Run-time performance of TypeTags, ClassTags and WeakTypeTags
                            
                                How to run spark interactively in cluster mode
                            
                                Why do we need the From type parameter in Scala's CanBuildFrom
                            
                                lazy implicit val not found
                            
                                Create Custom Cross Validation in Spark ML
                            
                                Self referencing a val during definition in scala
                            
                                Get the size of a resource
                            
                                How do you debug typelevel code?
                            
                                Why won't this Spark sample code load in spark-shell?
                            
                                How do I make a block aware execution context?
                            
                                too many map keys causing out of memory exception in spark
                            
                                How do you use play framework as a library, in a scala project
                            
                                Play Framework PathBindable with Dependency Injection
                            
                                How to early return in Scala [duplicate]
                            
                                Performance of loading parquet files into case classes in Spark
                            
                                Why is there a difference between Java8 and Scala2.12 lambda cache?
                            
                                Load a file from SFTP server into spark RDD
                            
                                Change a variable in the current sbt task scope
                            
                                Structured Streaming - Foreach Sink
                            
                                Read data from remote hive on spark over JDBC returns empty result

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can't I display prediction column of Spark MultilayerPerceptronClassifier?

Tags:

neural-network

scala

apache-spark

schoon

People also ask

1 Answers

schoon

Recent Activity

Donate For Us