I have a json data file which contain one property [creationDate] which is unix epoc in "long" number type. The Apache Spark DataFrame schema look like below:
root |-- creationDate: long (nullable = true) |-- id: long (nullable = true) |-- postTypeId: long (nullable = true) |-- tags: array (nullable = true) | |-- element: string (containsNull = true) |-- title: string (nullable = true) |-- viewCount: long (nullable = true)
I would like to do some groupBy "creationData_Year" which need to get from "creationDate".
What's the easiest way to do this kind of convert in DataFrame using Java?
from_unixtime() SQL function is used to convert or cast Epoch time to timestamp string and this function takes Epoch time as a first argument and formatted string time as the second argument. As a first argument, we use unix_timestamp() which returns the current timestamp in Epoch time (Long) as an argument.
Convert from epoch to human-readable dateString date = new java.text.SimpleDateFormat("MM/dd/yyyy HH:mm:ss").format(new java.util.Date (epoch*1000)); Epoch in seconds, remove '*1000' for milliseconds. myString := DateTimeToStr(UnixToDateTime(Epoch)); Where Epoch is a signed integer. Replace 1526357743 with epoch.
To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.
After checking spark dataframe api and sql function, I come out below snippet:
DateFrame df = sqlContext.read().json("MY_JSON_DATA_FILE"); DataFrame df_DateConverted = df.withColumn("creationDt", from_unixtime(stackoverflow_Tags.col("creationDate").divide(1000)));
The reason why "creationDate" column is divided by "1000" is cause the TimeUnit is different. The orgin "creationDate" is unix epoch in "milli-second", however spark sql "from_unixtime" is designed to handle unix epoch in "second".
pyspark converts from Unix epoch milliseconds to dataframe timestamp
df.select(from_unixtime((df.my_date_column.cast('bigint')/1000)).cast('timestamp').alias('my_date_column'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With