Let's say I have a column of Strings like this
Hour
0045
2322
And I want it to become like this:
Hour
00:45
23:22
In order to after turn into a timestamp. How would I go about it?
By using PySpark SQL function regexp_replace() you can replace a column value with a string for another string/substring. regexp_replace() uses Java regex for matching, if the regex does not match it returns an empty string, the below example replace the street name Rd value with Road string on address column.
If you would like to add a prefix or suffix to multiple columns in a pyspark dataframe, you could use a for loop and . withColumnRenamed(). You can amend sdf.
The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc.
You can use regexp_replace
from pyspark.sql.functions import col, regexp_replace
df.withColumn("Hour", regexp_replace(col("Hour") , "(\\d{2})(\\d{2})" , "$1:$2" ) ).show()
+-----+
| hour|
+-----+
|00:45|
|00:50|
+-----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With