I have a row from a data frame and I want to convert it to a Map[String, Any] that maps column names to the values in the row for that column.
Is there an easy way to do it?
I did it for string values like
def rowToMap(row:Row): Map[String, String] = {
row.schema.fieldNames.map(field => field -> row.getAs[String](field)).toMap
}
val myRowMap = rowToMap(myRow)
If the row contains other values, not specific ones like String then the code gets messier because the row does not have a a method .get(field)
Any ideas?
We can create a map column using createMapType() function on the DataTypes class. This method takes two arguments keyType and valueType as mentioned above and these two arguments should be of a type that extends DataType. This snippet creates “mapCol” object of type MapType with key and values as String type.
Spark map() is a transformation operation that is used to apply the transformation on every element of RDD, DataFrame, and Dataset and finally returns a new RDD/Dataset respectively. In this article, you will learn the syntax and usage of the map() transformation with an RDD & DataFrame example.
Delta Lake with Apache Spark using Scala Maps are also called Hash tables. There are two kinds of Maps, the immutable and the mutable. The difference between mutable and immutable objects is that when an object is immutable, the object itself can't be changed. By default, Scala uses the immutable Map.
map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements (0 or more) as an iterator. Also, the output of the flatMap is flattened.
You can use getValuesMap
:
val df = Seq((1, 2.0, "a")).toDF("A", "B", "C")
val row = df.first
To get Map[String, Any]
:
row.getValuesMap[Any](row.schema.fieldNames)
// res19: Map[String,Any] = Map(A -> 1, B -> 2.0, C -> a)
Or you can get Map[String, AnyVal]
for this simple case since the values are not complex objects
row.getValuesMap[AnyVal](row.schema.fieldNames)
// res20: Map[String,AnyVal] = Map(A -> 1, B -> 2.0, C -> a)
Note: the returned value type of the getValuesMap
can be labelled as any type, so you can not rely on it to figure out what data types you have but need to keep in mind what you have from the beginning instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With