Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I display prediction column of Spark MultilayerPerceptronClassifier?

I am using Spark's MultilayerPerceptronClassifier. This generates a column 'predicted' in 'predictions'. When I try to show it I get the error:

SparkException: Failed to execute user defined function($anonfun$1: (vector) => double) ...
Caused by: java.lang.IllegalArgumentException: requirement failed: A & B Dimension mismatch!

Other columns, for example, vector display OK. Part of predictions schema:

|-- vector: vector (nullable = true)
|-- prediction: double (nullable = true)

My code is:

//racist is boolean, needs to be string:
val train2 = train.withColumn("racist", 'racist.cast("String"))
val test2 = test.withColumn("racist", 'racist.cast("String"))

val indexer = new StringIndexer().setInputCol("racist").setOutputCol("indexracist")

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector") //.setVectorSize(3).setMinCount(0)

val layers = Array[Int](4,5, 2)

val mpc = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100).setFeaturesCol("vector").setLabelCol("indexracist")

val pipeline = new Pipeline().setStages(Array(indexer, word2Vec, mpc))

val model = pipeline.fit(train2)

val predictions = model.transform(test2)

predictions.select("prediction").show()

EDIT the proposed similar question's problem was

val layers = Array[Int](0, 0, 0, 0) 

which is not the case here, nor is it the same error.

EDIT AGAIN: part0 of train and test are saved in PARQUET format here.

like image 644
schoon Avatar asked Jul 20 '17 09:07

schoon


People also ask

How do I show specific columns in spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

How do you display the content of a DataFrame in spark SQL?

Spark show() – Display DataFrame Contents in Table. Spark DataFrame show() is used to display the contents of the DataFrame in a Table Row & Column Format. By default, it shows only 20 Rows and the column values are truncated at 20 characters.

How do I select columns in spark dataset?

To select a column from the Dataset, use apply method in Scala and col in Java. Note that the Column type can also be manipulated through its various functions. and in Java: // To create Dataset<Row> using SparkSession Dataset<Row> people = spark.

How do I cast a column in spark SQL?

To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.


1 Answers

The addition of .setVectorSize(3).setMinCount(0) and changing val layers = Array[Int](3,5, 2) made it work:

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector").setVectorSize(3).setMinCount(0)

// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4
// and output of size 3 (classes)
val layers = Array[Int](3,5, 2)
like image 118
schoon Avatar answered Sep 19 '22 02:09

schoon