Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get day of week in SparkSQL?

Tags:

apache-spark

I am trying to select all record recorded at Sunday through SparkSQL. I have the following try but in vain.

SELECT * FROM mytable WHERE DATEPART(WEEKDAY, create_time) = 0
SELECT * FROM mytable WHERE strftime("%w", create_time) = 0

How to get day of week in SparkSQL?

like image 858
Wen Hsiao Avatar asked Jul 29 '14 00:07

Wen Hsiao


People also ask

How do you get weekday on Pyspark?

We can easily get the day of week using date_format() .

How do you get the day from date Spark?

In order to get day of month, day of year and day of week from date in pyspark we will be using dayofmonth(), dayofyear() and dayofweek() function respectively. dayofyear() Function with column name as argument extracts nth day of year from date in pyspark.

What are the various data sources available in Sparksql?

Spark SQL can automatically capture the schema of a JSON dataset and load it as a DataFrame. Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext. Parquet is a columnar format, supported by many data processing systems.


3 Answers

SPARK 1.5.0 has a date_format function that accepts a format as an argument. This format returns a name of a week day from a timestamp:

select date_format(my_timestamp, 'EEEE') from ....

Result: e.g. 'Tuesday'

like image 67
Olena Avatar answered Oct 23 '22 07:10

Olena


If the create_time is in the format of UTC, you can use the following to filter out specific days in SparkSQL. I used Spark 1.6.1:

select id,  date_format(from_unixtime(created_utc), 'EEEE') from testTable where date_format(from_unixtime(created_utc), 'EEEE') == "Wednesday"

If you specify 'EEEE', the day of the week is spelled out completely. You can use 'E' to specify the shortened version, e.g. Wed. You can find more info here: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

like image 35
MrChristine Avatar answered Oct 23 '22 07:10

MrChristine


Here's an example that shows the output from the different options. Suppose you have this DataFrame.

+----------+
| some_date|
+----------+
|2021-01-10|
|2021-01-11|
|2021-01-12|
|2021-01-13|
|2021-01-14|
|2021-01-15|
|2021-01-16|
|      null|
+----------+

Here are the results of using the dayofweek, date_format, and dayOfWeekStr (from spark-daria) functions.

import com.github.mrpowers.spark.daria.sql.functions._

df
  .withColumn("dayofweek", dayofweek(col("some_date")))
  .withColumn("date_format", date_format(col("some_date"), "EEEE"))
  .withColumn("dayOfWeekStr", dayOfWeekStr(col("dayofweek")))
  .show()
+----------+---------+-----------+------------+
| some_date|dayofweek|date_format|dayOfWeekStr|
+----------+---------+-----------+------------+
|2021-01-10|        1|     Sunday|         Sun|
|2021-01-11|        2|     Monday|         Mon|
|2021-01-12|        3|    Tuesday|         Tue|
|2021-01-13|        4|  Wednesday|         Wed|
|2021-01-14|        5|   Thursday|         Thu|
|2021-01-15|        6|     Friday|         Fri|
|2021-01-16|        7|   Saturday|         Sat|
|      null|     null|       null|        null|
+----------+---------+-----------+------------+

The daysofweek output is best for date addition with date_add or date_sub, as described in this post.

The date_format solution is best for customizing the dates for a given format (e.g. for a human readable deliverable).

Some date functions, like next_day take a day in string form as argument, and that's when dayOfWeekStr comes in handy.

like image 39
Powers Avatar answered Oct 23 '22 07:10

Powers