I am using Spark's MultilayerPerceptronClassifier. This generates a column 'predicted' in 'predictions'. When I try to show it I get the error:
SparkException: Failed to execute user defined function($anonfun$1: (vector) => double) ...
Caused by: java.lang.IllegalArgumentException: requirement failed: A & B Dimension mismatch!
Other columns, for example, vector display OK. Part of predictions schema:
|-- vector: vector (nullable = true)
|-- prediction: double (nullable = true)
My code is:
//racist is boolean, needs to be string:
val train2 = train.withColumn("racist", 'racist.cast("String"))
val test2 = test.withColumn("racist", 'racist.cast("String"))
val indexer = new StringIndexer().setInputCol("racist").setOutputCol("indexracist")
val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector") //.setVectorSize(3).setMinCount(0)
val layers = Array[Int](4,5, 2)
val mpc = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100).setFeaturesCol("vector").setLabelCol("indexracist")
val pipeline = new Pipeline().setStages(Array(indexer, word2Vec, mpc))
val model = pipeline.fit(train2)
val predictions = model.transform(test2)
predictions.select("prediction").show()
EDIT the proposed similar question's problem was
val layers = Array[Int](0, 0, 0, 0)
which is not the case here, nor is it the same error.
EDIT AGAIN: part0 of train and test are saved in PARQUET format here.
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
Spark show() – Display DataFrame Contents in Table. Spark DataFrame show() is used to display the contents of the DataFrame in a Table Row & Column Format. By default, it shows only 20 Rows and the column values are truncated at 20 characters.
To select a column from the Dataset, use apply method in Scala and col in Java. Note that the Column type can also be manipulated through its various functions. and in Java: // To create Dataset<Row> using SparkSession Dataset<Row> people = spark.
To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.
The addition of .setVectorSize(3).setMinCount(0) and changing val layers = Array[Int](3,5, 2) made it work:
val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector").setVectorSize(3).setMinCount(0)
// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4
// and output of size 3 (classes)
val layers = Array[Int](3,5, 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With