Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark scala Casting Unix time to timestamp fails

I am having problem on converting unix time to timestamp.

I have a Dataframe, one column is PosTime. I would like to convert it to Timestamp, but it is half working. can you help?

scala> adsb.printSchema()
root
 |-- Icao: string (nullable = true)
 |-- Alt: long (nullable = true)
 |-- Lat: double (nullable = true)
 |-- Long: double (nullable = true)
 |-- PosTime: long (nullable = true)
 |-- Spd: double (nullable = true)
 |-- Trak: double (nullable = true)
 |-- Type: string (nullable = true)
 |-- Op: string (nullable = true)
 |-- Cou: string (nullable = true)

scala> adsb.show(50)
+------+------+---------+----------+-------------+-----+-----+----+--------------------+--------------------+
|  Icao|   Alt|      Lat|      Long|      PosTime|  Spd| Trak|Type|                  Op|                 Cou|
+------+------+---------+----------+-------------+-----+-----+----+--------------------+--------------------+
|ABECE7|  4825|40.814442| -111.9776|1506875131778|197.0|356.0|B739|     Delta Air Lines|       United States|
|4787B0| 38000|     null|      null|         null| null| null|B738|           Norwegian|              Norway|
|D3B18A|  4222|     null|      null|         null| null| null|null|                null|Unknown or unassi...|
|3C3F78|118400|     null|      null|         null| null| null|null|                null|             Germany|
|AA1C45|   -75|40.695969|-74.166321|1506875131747|157.4| 25.6|null|                null|       United States|
 scala> val adsb1 = adsb.withColumn("PosTime", $"PosTime".cast(TimestampType))

scala> adsb_sort.show(100)
+------+-------+---------+---------+--------------------+-------+-------+----+----+--------------------+
|  Icao|    Alt|      Lat|     Long|             PosTime|    Spd|   Trak|Type|  Op|                 Cou|
+------+-------+---------+---------+--------------------+-------+-------+----+----+--------------------+
|FFFFFF|   null|     null|     null|                null|   null|   null|null|null|Unknown or unassi...|
|FFFFFF|1049093|      0.0|      0.0|49800-05-04 14:39...|28672.0| 1768.7|null|null|Unknown or unassi...|
|FFFFFF|  12458|      0.0|      0.0|49800-12-11 06:39...|    0.0| 2334.4|null|null|Unknown or unassi...|
like image 408
user8149657 Avatar asked Dec 10 '22 09:12

user8149657


1 Answers

Spark interprets Long as timestamp in seconds but it looks like data is in milliseconds:

scala> spark.sql("SELECT CAST(1506875131778 / 1000 AS timestamp)").show
+-------------------------------------------------------------------------+
|CAST((CAST(1506875131778 AS DOUBLE) / CAST(1000 AS DOUBLE)) AS TIMESTAMP)|
+-------------------------------------------------------------------------+
|                                                     2017-10-01 18:25:...|
+-------------------------------------------------------------------------+

If I am right, just divide by 1000:

adsb.withColumn("PosTime", ($"PosTime" / 1000).cast(TimestampType))
like image 187
Alper t. Turker Avatar answered Dec 29 '22 03:12

Alper t. Turker