Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prepend zeros to a value in PySpark

I have a dataframe df :

val1   val2  val3
271   70    151
213   1     379
213   3     90
213   6     288
20    55    165

I want to transform this data frame as:

val1   val2  val3
271   70    0151
213   01    0379
213   03    0090
213   06    0288
020   55    0165

How can I do that in pyspark? And is it possible to do it with Spark SQL? Any help is welcome.

like image 616
Aman Burman Avatar asked Dec 29 '16 06:12

Aman Burman


2 Answers

For numeric types you can use format_string:

from pyspark.sql.functions import format_string

(sc.parallelize([(271, ), (20, ), (3, )])
    .toDF(["val"])
    .select(format_string("%03d", "val"))
    .show())
+------------------------+
|format_string(%03d, val)|
+------------------------+
|                     271|
|                     020|
|                     003|
+------------------------+

For strings lpad:

from pyspark.sql.functions import lpad

(sc.parallelize([("271", ), ("20", ), ("3", )])
    .toDF(["val"])
    .select(lpad("val", 3, "0"))
    .show())
+---------------+
|lpad(val, 3, 0)|
+---------------+
|            271|
|            020|
|            003|
+---------------+
like image 96
zero323 Avatar answered Nov 11 '22 00:11

zero323


from pyspark.sql.functions import col, format_string

df = spark.createDataFrame([('123',),('1234',)],['number',])

df = df.withColumn('number_padded', format_string("%012d", col('number').cast('int')))

df.show()
number number_padded
123 000000000123
1234 000000001234
like image 32
Haris R Avatar answered Nov 11 '22 02:11

Haris R