I have a dataframe without schema and every column stored as StringType such as:
ID | LOG_IN_DATE | USER
1 | 2017-11-01 | Johns
Now I created a schema dataframe as [(ID,"double"),("LOG_IN_DATE","date"),(USER,"string")] and I would like to apply to the above Dataframe in Spark 2.0.2 with Scala 2.11.
I already tried:
schema.map(x => df.withColumn(x._1, col(x._1).cast(x._2)))
There's no error while running this but afterwards when I call the df.schema, nothing is changed.
Any idea on how I could programmatically apply the schema to df? My friend told me I can use foldLeft method but I don't think this is a method in Spark 2.0.2 neither in df nor rdd.
If you already have a list [(ID,"double"),("LOG_IN_DATE","date"),(USER,"string")], you can use select with each column casting to its type from the list
Your dataframe
val df = Seq(("1", "2017-11-01", "Johns"), ("2", "2018-01-03", "jons2")).toDF("ID", "LOG_IN_DATE", "USER")
Your schema
val schema = List(("ID", "double"), ("LOG_IN_DATE", "date"), ("USER", "string"))
Cast all the columns to its type from the list
val newColumns = schema.map(c => col(c._1).cast(c._2))
select all te casted columns
val newDF = df.select(newColumns:_*)
Print Schema
newDF.printSchema()
root
|-- ID: double (nullable = true)
|-- LOG_IN_DATE: date (nullable = true)
|-- USER: string (nullable = true)
Show Dataframe
newDF.show()
Output:
+---+-----------+-----+
|ID |LOG_IN_DATE|USER |
+---+-----------+-----+
|1.0|2017-11-01 |Johns|
|2.0|2018-01-03 |Jons2|
+---+-----------+-----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With