VectorAssembler does not support the StringType type scala spark convert

Question

I have a dataframe that contains string columns and I am planning to use it as input for k-means using spark and scala. I am converting my string typed columns of the dataframe using the method below:

 val toDouble = udf[Double, String]( _.toDouble) 
 val analysisData  = dataframe_mysql.withColumn("Event", toDouble(dataframe_mysql("event"))).withColumn("Execution", toDouble(dataframe_mysql("execution"))).withColumn("Info", toDouble(dataframe_mysql("info")))              
 val assembler = new VectorAssembler()
    .setInputCols(Array("execution", "event", "info"))
    .setOutputCol("features")
val output = assembler.transform(analysisData)
println(output.select("features", "execution").first())

when I print the analysisData schema the convertion is correct. but I am getting an exception: VectorAssembler does not support the StringType type which means that my values are still strings! how can I convert the values and not only the schema type?

thanks

Kevin Eid · Accepted Answer

Indeed, the VectorAssembler Transformer does not take strings. So you need to make sure that your columns match numerical, boolean, vector types. Make sure that your udf is doing the right thing and be sure that none of the columns has StringType.

To convert a column in a Spark DataFrame to another type, make it simple and use the cast() DSL function like so:

val analysisData  = dataframe_mysql.withColumn("Event", dataframe_mysql("Event").cast(DoubleType))

It should work!

VectorAssembler does not support the StringType type scala spark convert

Tags:

types

scala

vector

apache-spark

Kratos

1 Answers

Kevin Eid

Recent Activity

Donate For Us

VectorAssembler does not support the StringType type scala spark convert

Tags:

types

scala

vector

apache-spark

Kratos

1 Answers

Kevin Eid

Related questions

Recent Activity

Donate For Us