Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VectorAssembler does not support the StringType type scala spark convert

I have a dataframe that contains string columns and I am planning to use it as input for k-means using spark and scala. I am converting my string typed columns of the dataframe using the method below:

 val toDouble = udf[Double, String]( _.toDouble) 
 val analysisData  = dataframe_mysql.withColumn("Event", toDouble(dataframe_mysql("event"))).withColumn("Execution", toDouble(dataframe_mysql("execution"))).withColumn("Info", toDouble(dataframe_mysql("info")))              
 val assembler = new VectorAssembler()
    .setInputCols(Array("execution", "event", "info"))
    .setOutputCol("features")
val output = assembler.transform(analysisData)
println(output.select("features", "execution").first())

when I print the analysisData schema the convertion is correct. but I am getting an exception: VectorAssembler does not support the StringType type which means that my values are still strings! how can I convert the values and not only the schema type?

thanks

like image 420
Kratos Avatar asked May 30 '16 14:05

Kratos


1 Answers

Indeed, the VectorAssembler Transformer does not take strings. So you need to make sure that your columns match numerical, boolean, vector types. Make sure that your udf is doing the right thing and be sure that none of the columns has StringType.

To convert a column in a Spark DataFrame to another type, make it simple and use the cast() DSL function like so:

val analysisData  = dataframe_mysql.withColumn("Event", dataframe_mysql("Event").cast(DoubleType))

It should work!

like image 70
Kevin Eid Avatar answered Nov 19 '22 17:11

Kevin Eid