I am new to Scala. I have a Dataframe with fields
ID:string, Time:timestamp, Items:array(struct(name:string,ranking:long))
I want to convert each row of the Items field to a hashmap, with the name as the key. I am not very sure how to do this.
In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
Solution: PySpark provides a create_map() function that takes a list of column types as an argument and returns a MapType column, so we can use this to convert the DataFrame struct column to map Type. struct is a type of StructType and MapType is used to store Dictionary key-value pair.
The StructType and StructFields are used to define a schema or its part for the Dataframe. This defines the name, datatype, and nullable flag for each column. StructType object is the collection of StructFields objects. It is a Built-in datatype that contains the list of StructField.
StructType objects define the schema of Spark DataFrames. StructType objects contain a list of StructField objects that define the name, type, and nullable flag for each column in a DataFrame.
This can be done using a UDF:
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
// Sample data:
val df = Seq(
("id1", "t1", Array(("n1", 4L), ("n2", 5L))),
("id2", "t2", Array(("n3", 6L), ("n4", 7L)))
).toDF("ID", "Time", "Items")
// Create UDF converting array of (String, Long) structs to Map[String, Long]
val arrayToMap = udf[Map[String, Long], Seq[Row]] {
array => array.map { case Row(key: String, value: Long) => (key, value) }.toMap
}
// apply UDF
val result = df.withColumn("Items", arrayToMap($"Items"))
result.show(false)
// +---+----+---------------------+
// |ID |Time|Items |
// +---+----+---------------------+
// |id1|t1 |Map(n1 -> 4, n2 -> 5)|
// |id2|t2 |Map(n3 -> 6, n4 -> 7)|
// +---+----+---------------------+
I can't see a way to do this without a UDF (using Spark's built-in functions only).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With