Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark, Add a character in the middle of a string

Let's say I have a column of Strings like this

Hour
0045
2322

And I want it to become like this:

Hour
00:45
23:22   

In order to after turn into a timestamp. How would I go about it?

like image 636
BryceSoker Avatar asked Jan 02 '18 14:01

BryceSoker


People also ask

How do you replace a character in a string in Pyspark?

By using PySpark SQL function regexp_replace() you can replace a column value with a string for another string/substring. regexp_replace() uses Java regex for matching, if the regex does not match it returns an empty string, the below example replace the street name Rd value with Road string on address column.

How do you add a prefix in Pyspark?

If you would like to add a prefix or suffix to multiple columns in a pyspark dataframe, you could use a for loop and . withColumnRenamed(). You can amend sdf.

How do you split a string in Pyspark DataFrame?

The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc.


1 Answers

You can use regexp_replace

from pyspark.sql.functions import col, regexp_replace

df.withColumn("Hour", regexp_replace(col("Hour") ,  "(\\d{2})(\\d{2})" , "$1:$2" ) ).show()

+-----+
| hour|
+-----+
|00:45|
|00:50|
+-----+
like image 78
philantrovert Avatar answered Sep 22 '22 04:09

philantrovert