Converting multiple different columns to Map column with Spark Dataframe scala

Tags:

I have a data frame with column: user, address1, address2, address3, phone1, phone2 and so on. I want to convert this data frame to - user, address, phone where address = Map("address1" -> address1.value, "address2" -> address2.value, "address3" -> address3.value)

I was able to convert the columns to map using:

val mapData = List("address1", "address2", "address3")
df.map(_.getValuesMap[Any](mapData))

but I am not sure how to add this to my df.

I am new to spark and scala and could really use some help here.

332

asked Oct 18 '15 14:10

Jds

1 Answers

Spark >= 2.0

You can skip udf and use map (create_map in Python) SQL function:

import org.apache.spark.sql.functions.map

df.select(
  map(mapData.map(c => lit(c) :: col(c) :: Nil).flatten: _*).alias("a_map")
)

Spark < 2.0

As far as I know there is no direct way to do it. You can use an UDF like this:

import org.apache.spark.sql.functions.{udf, array, lit, col}

val df = sc.parallelize(Seq(
  (1L, "addr1", "addr2", "addr3")
)).toDF("user", "address1", "address2", "address3")

val asMap = udf((keys: Seq[String], values: Seq[String]) => 
  keys.zip(values).filter{
    case (k, null) => false
    case _ => true
  }.toMap)

val keys = array(mapData.map(lit): _*)
val values = array(mapData.map(col): _*)

val dfWithMap = df.withColumn("address", asMap(keys, values))

Another option, which doesn't require UDFs, is to struct field instead of map:

val dfWithStruct = df.withColumn("address", struct(mapData.map(col): _*))

The biggest advantage is that it can easily handle values of different types.

answered Oct 22 '22 03:10

zero323

Related questions
                            
                                What does -> _ => mean in Scala/Lift?
                            
                                Play 2.0 scala tutorial - heroku failing due to evolution
                            
                                How to clear/drop/empty a MongoDb collection with Casbah
                            
                                How do I get the absolute remote actor url from inside the actor?
                            
                                How to pass flash data from controller to view with Play! framework
                            
                                Cannot override a type with non-volatile upper bound
                            
                                Scala: Mocking and the Cake Pattern
                            
                                Akka Actor Priorities
                            
                                Scala, Play Framework Slick issue - could not find implicit value for parameter rconv
                            
                                Slick 2.0.0-M3 table definition - clarification on the tag attribute
                            
                                How to run a spark example program in Intellij IDEA
                            
                                How to serialize Scala Map to Json in PlayFramework?
                            
                                How to disable http port in play framework?
                            
                                When overriding a trait, why the value is strange?
                            
                                How can I get the project path in Scala?
                            
                                read files recursively from sub directories with spark from s3 or local filesystem
                            
                                Why I can't use a case object as a polymorphic type
                            
                                Scala Case Class Tupled
                            
                                How to build efficient Kafka broker healthcheck?
                            
                                Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Converting multiple different columns to Map column with Spark Dataframe scala

Tags:

dataframe

scala

apache-spark

apache-spark-sql

Jds

People also ask

1 Answers

zero323

Recent Activity

Donate For Us