Change schema of existing dataframe

Tags:

I want to change schema of existing dataframe,while changing the schema I'm experiencing error.Is it possible I can change the existing schema of a dataframe.

val customSchema=StructType(
      Array(
        StructField("data_typ", StringType, nullable=false),
        StructField("data_typ", IntegerType, nullable=false),
        StructField("proc_date", IntegerType, nullable=false),
        StructField("cyc_dt", DateType, nullable=false),
        ));

val readDF=
+------------+--------------------+-----------+--------------------+
|DatatypeCode|         Description|monthColNam|     timeStampColNam|
+------------+--------------------+-----------+--------------------+
|       03099|Volumetric/Expand...|     201867|2018-05-31 18:25:...|
|       03307|  Elapsed Day Factor|     201867|2018-05-31 18:25:...|
+------------+--------------------+-----------+--------------------+

val rows= readDF.rdd
val readDF1 = sparkSession.createDataFrame(rows,customSchema)

expected result

val newdf=
    +------------+--------------------+-----------+--------------------+
    |data_typ_cd |       data_typ_desc|proc_dt    |     cyc_dt         |
    +------------+--------------------+-----------+--------------------+
    |       03099|Volumetric/Expand...|     201867|2018-05-31 18:25:...|
    |       03307|  Elapsed Day Factor|     201867|2018-05-31 18:25:...|
    +------------+--------------------+-----------+--------------------+

Any help will be appricated

521

asked May 31 '18 13:05

user9318576

2 Answers

You can do something like this to change the datatype from one to other.

I have created a dataframe similar to yours like below:

import sparkSession.sqlContext.implicits._
import org.apache.spark.sql.types._

var df = Seq(("03099","Volumetric/Expand...", "201867", "2018-05-31 18:25:00"),("03307","Elapsed Day Factor", "201867", "2018-05-31 18:25:00"))
  .toDF("DatatypeCode","data_typ", "proc_date", "cyc_dt")

df.printSchema()
df.show()

This gives me the following output:

root
 |-- DatatypeCode: string (nullable = true)
 |-- data_typ: string (nullable = true)
 |-- proc_date: string (nullable = true)
 |-- cyc_dt: string (nullable = true)

+------------+--------------------+---------+-------------------+
|DatatypeCode|            data_typ|proc_date|             cyc_dt|
+------------+--------------------+---------+-------------------+
|       03099|Volumetric/Expand...|   201867|2018-05-31 18:25:00|
|       03307|  Elapsed Day Factor|   201867|2018-05-31 18:25:00|
+------------+--------------------+---------+-------------------+

If you see the schema above all the columns are of type String. Now I want to change the column proc_date to Integer type and cyc_dt to Date type, I will do the following:

df = df.withColumnRenamed("DatatypeCode", "data_type_code")

df = df.withColumn("proc_date_new", df("proc_date").cast(IntegerType)).drop("proc_date")

df = df.withColumn("cyc_dt_new", df("cyc_dt").cast(DateType)).drop("cyc_dt")

and when you check the schema of this dataframe

df.printSchema()

then it gives the output as following with the new column names:

root
 |-- data_type_code: string (nullable = true)
 |-- data_typ: string (nullable = true)
 |-- proc_date_new: integer (nullable = true)
 |-- cyc_dt_new: date (nullable = true)

answered Oct 24 '22 22:10

Prasad Khode

You cannot change schema like this. Schema object passed to createDataFrame has to match the data, not the other way around:

To parse timestamp data use corresponding functions, for example like Better way to convert a string field into timestamp in Spark
To change other types use cast method, for example how to change a Dataframe column from String type to Double type in pyspark

answered Oct 24 '22 20:10

user9876218

Related questions
                            
                                How to convert a Some(" ") to None in one-line?
                            
                                How to initialize empty variables from your own type in Scala?
                            
                                Scala syntax to access property of an option inline and chain "OrElse"?
                            
                                What is the syntax for creating a Map in Scala that uses an enum as a key?
                            
                                Can anyone explain how the symbol "=>" is used in Scala
                            
                                Getting started with Scala, Scalatest, and Maven
                            
                                Futures for blocking calls in Scala
                            
                                Parallel version of Files.walkFileTree (java or scala)
                            
                                def or val for defining Function in Scala
                            
                                Unresolved dependency: com.hadoop.gplcompression#hadoop-lzo;0.4.16 when "sbt update" in scalding
                            
                                Why does sbt-native-packager generate no bin directory?
                            
                                akka.actor.ActorLogging does not log the stack trace of exception by logback
                            
                                Play Framework template that is actually a JS file
                            
                                In Scala find files that match a wildcard String
                            
                                How to run tests in a class sequentially in ScalaTest?
                            
                                Scala return boolean with if else
                            
                                How to get applicationId of Spark application deployed to YARN in Scala?
                            
                                Looping through a list of tuples in Scala
                            
                                How to use functions provide by DataFrameNaFunctions class in Spark, on a Dataframe?
                            
                                Docker Akka-Http application endpoint not reachable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Change schema of existing dataframe

Tags:

dataframe

scala

apache-spark

user9318576

People also ask

2 Answers

Prasad Khode

user9876218

Recent Activity

Donate For Us