Scala: Spark SQL to_date(unix_timestamp) returning NULL

Tags:

Spark Version: spark-2.0.1-bin-hadoop2.7 Scala: 2.11.8

I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of 2016-10-25. The parameter date_format includes string of column names that need to be converted to yyyy-mm-dd format.

In the following code, I first loaded the csv of Date column as StringType via the schema, and then I check if the date_format is not empty, that is there are columns that need to be converted to Date from String, then cast each column using unix_timestamp and to_date. However, in the csv_df.show(), the returned rows are all null.

def read_csv(csv_source:String, delimiter:String, is_first_line_header:Boolean, 
    schema:StructType, date_format:List[String]): DataFrame = {
    println("|||| Reading CSV Input ||||")

    var csv_df = sqlContext.read
        .format("com.databricks.spark.csv")
        .schema(schema)
        .option("header", is_first_line_header)
        .option("delimiter", delimiter)
        .load(csv_source)
    println("|||| Successfully read CSV. Number of rows -> " + csv_df.count() + " ||||")
    if(date_format.length > 0) {
        for (i <- 0 until date_format.length) {
            csv_df = csv_df.select(to_date(unix_timestamp(
                csv_df(date_format(i)), "yyyy-MM-dd").cast("timestamp")))
            csv_df.show()
        }
    }
    csv_df
}

Returned Top 20 rows:

+-------------------------------------------------------------------------+
|to_date(CAST(unix_timestamp(prom_price_date, YYYY-MM-DD) AS TIMESTAMP))|
+-------------------------------------------------------------------------+
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
+-------------------------------------------------------------------------+

Why am I getting all null?

821

asked Nov 04 '16 23:11

Sai Wai Maung

1 Answers

To convert yyyyMMdd to yyyy-MM-dd you can:

spark.sql("""SELECT DATE_FORMAT(
  CAST(UNIX_TIMESTAMP('20161025', 'yyyyMMdd') AS TIMESTAMP), 'yyyy-MM-dd'
)""")

with functions:

date_format(unix_timestamp(col, "yyyyMMdd").cast("timestamp"), "yyyy-MM-dd")

answered Sep 18 '22 05:09

user6022341

Related questions
                            
                                What is the difference between partition and groupBy?
                            
                                How do I specify a config file with play 2.4 and activator
                            
                                Crontab style scheduling in Play 2.4.x?
                            
                                What is the keyboard shortcut for ⇒ in your Scala editor of choice?
                            
                                How to Validate contents of Spark Dataframe
                            
                                Scala pattern matching on generic type with TypeTag generates a warning while ClassTag not?
                            
                                Adding Two Lists of Same Size at Compile-time [duplicate]
                            
                                How to join two lists in Scala?
                            
                                Missing parameter type
                            
                                Akka Future - Parallel versus Concurrent?
                            
                                Convert scala to native binary
                            
                                Access Spark broadcast variable in different classes
                            
                                How to normalize or standardize the data having multiple columns/variables in spark using scala?
                            
                                providing a constructor for a scala trait
                            
                                array transpose in scala
                            
                                play silhouette is not inserting password into database table
                            
                                Akka stream - List to mapAsync of individual elements
                            
                                What is the advantage of using Option.map over Option.isEmpty and Option.get?
                            
                                SBT - Multi project merge strategy and build sbt structure when using assembly
                            
                                What is '_=' in scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scala: Spark SQL to_date(unix_timestamp) returning NULL

Tags:

scala

apache-spark

apache-spark-sql

spark-csv

spark-dataframe

Sai Wai Maung

People also ask

1 Answers

user6022341

Recent Activity

Donate For Us