I am trying to add leading zeroes to a column in my pyspark dataframe
input :-
ID 123
Output expected:
000000000123
There is lpad function. Left-pad the string column to width len with pad.
from pyspark.sql.functions import lpad
df.select(lpad(df.ID, 12, '0').alias('s')).collect()
Use format_string
function to pad zeros in the beginning.
from pyspark.sql.functions import col, format_string
df = spark.createDataFrame([('123',),('1234',)],['number',])
df.show()
+------+
|number|
+------+
| 123|
| 1234|
+------+
If the number is string
, make sure to cast it into integer
.
df = df.withColumn('number_padded', format_string("%012d", col('number').cast('int')))
df.show()
+------+-------------+
|number|number_padded|
+------+-------------+
| 123| 000000000123|
| 1234| 000000001234|
+------+-------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With