I have the following dataframe in pyspark:
Name | Seconds
|Enviar solicitud ...| 1415
|Analizar mapa de ...| 1209|
|Modificar solicit...| 591|
|Entregar servicio...|91049|
I wish to convert the seconds
column either to date or timestamp (hopefully todate), I am trying to use the following function
def to_date(seconds=0):
dat = ''
if seconds == 0:
dat = '0'
if (seconds / 86400) >= 1:
day = (int(seconds / 86400))
seconds = (seconds - 86400 * int(seconds / 86400))
dat = f'{day}d '
if (seconds / 3600) >= 1:
hour = (int(seconds / 3600))
seconds = (seconds - 3600 * int(seconds / 3600))
dat = dat + f'{hour}hr '
if (seconds / 60) >= 1:
minutes = (int(seconds / 60))
dat = dat + f'{minutes}min'
else:
return '0min'
return dat
But there is no easy way such as Pandas .apply(to_date)
in pyspark, is there anyway to achieve what I am trying to do?
EXPECTED OUTPUT:
Analizar mapa de comparacion de presupuestos 1209 20min
Crear mapa de comparacion de presupuestos 12155 3hr 22min
Entregar servicios de bienes 91049 1d 1hr 17min
There is no built-in function for that in Spark but this can be done without UDF. You can simply calculate it using division and modulo operations to get the different parts (days, hours, ...), and concatenate to get the desired formatting.
For Spark 2.4+, you can use higher-order function zip_with
and array_join
. First create parts
column that contains number of days, hours, minutes and seconds from the Seconds
column. Then zip it with the literal array of units array('d', 'hr', 'min', 'sec')
to concatenate each part with its unit and finally join all the elements with a comma delimiter.
duration_parts = [(86400, 7), (3600, 24), (60, 60), (1, 60)]
exp = "zip_with(parts, array('d', 'hr', 'min', 'sec'), (x, y) -> IF(x > 0, concat(x, y), null))"
df.withColumn("parts", array(*[(floor(col("Seconds") / d)) % m for d, m in duration_parts]))\
.withColumn("duration", array_join(expr(exp), ", "))\
.drop("parts")\
.show(truncate=False)
#+--------------------------------------------+-------+---------------------+
#|Name |Seconds|duration |
#+--------------------------------------------+-------+---------------------+
#|Analizar mapa de comparacion de presupuestos|1209 |20min, 9sec |
#|Crear mapa de comparacion de presupuestos |12155 |3hr, 22min, 35sec |
#|Entregar servicios de bienes |91049 |1d, 1hr, 17min, 29sec|
#+--------------------------------------------+-------+---------------------+
Another way is using concat
and add when
expression if you don't want parts that are equal to 0:
df.withColumn("duration", concat(
floor(col("Seconds") / 86400), lit("d, "),
floor(col("Seconds") % 86400 / 3600), lit("hr, "),
floor((col("Seconds") % 86400) % 3600 / 60), lit("min, "),
floor(((col("Seconds") % 86400) % 3600) % 60), lit("sec "),
)).show(truncate=False)
I think this can be achieved without a UDF and it will much faster and scalable for big data. Try this and let me know if there is a hole in my logic.
from pyspark.sql import functions as F
from pyspark.sql.functions import when
df.withColumn("Minutes", F.round((F.col("Seconds")/60),2))\
.withColumn("Hours", F.floor((F.col("Minutes")/60)))\
.withColumn("hourmin", F.floor(F.col("Minutes")-(F.col("Hours").cast("int") * 60)))\
.withColumn("Days", F.floor((F.col("Hours")/24)))\
.withColumn("Days2", F.col("Days")*24)\
.withColumn("Time", F.when((F.col("Hours")==0) &(F.col("Days")==0), F.concat(F.col("hourmin"),F.lit("min"))).when((F.col("Hours")!=0)&(F.col("Days")==0), F.concat(F.col("Hours"),F.lit("hr "),F.col("hourmin"),F.lit("min"))).when(F.col("Days")!=0, F.concat(F.col("Days"),F.lit("d "),(F.col("Hours")-F.col("Days2")),F.lit("hr "),F.col("hourmin"),F.lit("min"))))\
.drop("Minutes","Hours","hourmin","Days","Days2")\
.show()
+-----------------+-------+---------------+
| Name|Seconds| Time|
+-----------------+-------+---------------+
| Enviar solicitud| 1209| 20min|
| Analizar mapa de| 12155| 3hr 22min|
|Entregar servicio| 91049| 1d 1hr 17min|
| example1| 1900| 31min|
| example2| 2500| 41min|
| example3|9282398|107d 10hr 26min|
+-----------------+-------+---------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With