Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Row to map in spark scala

I have a row from a data frame and I want to convert it to a Map[String, Any] that maps column names to the values in the row for that column.

Is there an easy way to do it?

I did it for string values like

def rowToMap(row:Row): Map[String, String] = {
row.schema.fieldNames.map(field => field -> row.getAs[String](field)).toMap
}

val myRowMap = rowToMap(myRow)

If the row contains other values, not specific ones like String then the code gets messier because the row does not have a a method .get(field)

Any ideas?

like image 353
Sorin Bolos Avatar asked Sep 11 '17 12:09

Sorin Bolos


People also ask

How do I create a map in spark?

We can create a map column using createMapType() function on the DataTypes class. This method takes two arguments keyType and valueType as mentioned above and these two arguments should be of a type that extends DataType. This snippet creates “mapCol” object of type MapType with key and values as String type.

What does .map do in spark?

Spark map() is a transformation operation that is used to apply the transformation on every element of RDD, DataFrame, and Dataset and finally returns a new RDD/Dataset respectively. In this article, you will learn the syntax and usage of the map() transformation with an RDD & DataFrame example.

What is map in Scala spark?

Delta Lake with Apache Spark using Scala Maps are also called Hash tables. There are two kinds of Maps, the immutable and the mutable. The difference between mutable and immutable objects is that when an object is immutable, the object itself can't be changed. By default, Scala uses the immutable Map.

What is difference between map and flatMap in spark?

map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements (0 or more) as an iterator. Also, the output of the flatMap is flattened.


1 Answers

You can use getValuesMap:

val df = Seq((1, 2.0, "a")).toDF("A", "B", "C")    
val row = df.first

To get Map[String, Any]:

row.getValuesMap[Any](row.schema.fieldNames)
// res19: Map[String,Any] = Map(A -> 1, B -> 2.0, C -> a)

Or you can get Map[String, AnyVal] for this simple case since the values are not complex objects

row.getValuesMap[AnyVal](row.schema.fieldNames)
// res20: Map[String,AnyVal] = Map(A -> 1, B -> 2.0, C -> a)

Note: the returned value type of the getValuesMap can be labelled as any type, so you can not rely on it to figure out what data types you have but need to keep in mind what you have from the beginning instead.

like image 153
Psidom Avatar answered Oct 01 '22 16:10

Psidom