I am trying to convert a column which contains Array[String] to String, but I consistently get this error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 78.0 failed 4 times, most recent failure: Lost task 0.3 in stage 78.0 (TID 1691, ip-******): java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String;
Here's the piece of code
val mkString = udf((arrayCol:Array[String])=>arrayCol.mkString(","))
val dfWithString=df.select($"arrayCol").withColumn("arrayString",
mkString($"arrayCol"))
In order to convert array to a string, PySpark SQL provides a built-in function concat_ws() which takes delimiter of your choice as a first argument and array column (type Column) as the second argument. In order to use concat_ws() function, you need to import it using pyspark.
To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.
In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
WrappedArray
is not an Array
(which is plain old Java Array
not a natve Scala collection). You can either change signature to:
import scala.collection.mutable.WrappedArray
(arrayCol: WrappedArray[String]) => arrayCol.mkString(",")
or use one of the supertypes like Seq
:
(arrayCol: Seq[String]) => arrayCol.mkString(",")
In the recent Spark versions you can use concat_ws
instead:
import org.apache.spark.sql.functions.concat_ws
df.select(concat_ws(",", $"arrayCol"))
The code work for me:
df.select("wifi_ids").rdd.map(row =>row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].toSeq.map(x=>x.toSeq.apply(0)))
In your case,I guess it is:
val mkString = udf(arrayCol=>arrayCol.asInstanceOf[WrappedArray[String]].toArray.mkString(","))
val dfWithString=df.select($"arrayCol").withColumn("arrayString",mkString($"arrayCol"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With