Spark Strutured Streaming automatically converts timestamp to local time

Tags:

I have my timestamp in UTC and ISO8601, but using Structured Streaming, it gets automatically converted into the local time. Is there a way to stop this conversion? I would like to have it in UTC.

I'm reading json data from Kafka and then parsing them using the from_json Spark function.

Input:

{"Timestamp":"2015-01-01T00:00:06.222Z"}

Flow:

SparkSession
  .builder()
  .master("local[*]")
  .appName("my-app")
  .getOrCreate()
  .readStream()
  .format("kafka")
  ... //some magic
  .writeStream()
  .format("console")
  .start()
  .awaitTermination();

Schema:

StructType schema = DataTypes.createStructType(new StructField[] {
        DataTypes.createStructField("Timestamp", DataTypes.TimestampType, true),});

Output:

+--------------------+
|           Timestamp|
+--------------------+
|2015-01-01 01:00:...|
|2015-01-01 01:00:...|
+--------------------+

As you can see, the hour has incremented by itself.

PS: I tried to experiment with the from_utc_timestamp Spark function, but no luck.

363

asked Feb 13 '18 12:02

2 Answers

For me it worked to use:

spark.conf.set("spark.sql.session.timeZone", "UTC")

It tells the spark SQL to use UTC as a default timezone for timestamps. I used it in spark SQL for example:

select *, cast('2017-01-01 10:10:10' as timestamp) from someTable

I know it does not work in 2.0.1. but works in Spark 2.2. I used in SQLTransformer also and it worked.

I am not sure about streaming though.

161

answered Oct 18 '22 21:10

astro_asz

Note:

This answer is useful primarily in Spark < 2.2. For newer Spark version see the answer by astro-asz

However we should note that as of Spark 2.4.0, spark.sql.session.timeZone doesn't set user.timezone (java.util.TimeZone.getDefault). So setting spark.sql.session.timeZone alone can result in rather awkward situation where SQL and non-SQL components use different timezone settings.

Therefore I still recommend setting user.timezone explicitly, even if spark.sql.session.timeZone is set.

TL;DR Unfortunately this is how Spark handles timestamps right now and there is really no built-in alternative, other than operating on epoch time directly, without using date/time utilities.

You can an insightful discussion on the Spark developers list: SQL TIMESTAMP semantics vs. SPARK-18350

The cleanest workaround I've found so far is to set -Duser.timezone to UTC for both the driver and executors. For example with submit:

bin/spark-shell --conf "spark.driver.extraJavaOptions=-Duser.timezone=UTC" \
                --conf "spark.executor.extraJavaOptions=-Duser.timezone=UTC"

or by adjusting configuration files (spark-defaults.conf):

spark.driver.extraJavaOptions      -Duser.timezone=UTC
spark.executor.extraJavaOptions    -Duser.timezone=UTC

answered Oct 18 '22 23:10

zero323

Related questions
                            
                                Factories: how to pass temporary smart pointers to functions. C++
                            
                                How to get chrome version using command prompt in windows
                            
                                Stream to LinkedHashSet [duplicate]
                            
                                ImportError: cannot import name 'factorial'
                            
                                React functional component static property
                            
                                How to use Vue.prototype or global variable in Vue 3?
                            
                                A little diversion into floating point (im)precision, part 1
                            
                                jQuery AJAX vs. UpdatePanel
                            
                                What is the "best" way to create a thumbnail using ASP.NET? [closed]
                            
                                Traditional ASP .NET Web Forms vs MVC
                            
                                Java: Swing Libraries & Thread Safety
                            
                                ASP.NET membership password expiration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Strutured Streaming automatically converts timestamp to local time

Tags:

java

scala

apache-spark

apache-spark-sql

spark-structured-streaming

Martin Brisiak

People also ask

2 Answers

astro_asz

zero323

Recent Activity

Donate For Us