How to parse datetime that is coming in Arabic text (٠٤-٢٥-٢٠٢١) to English dates in Pyspark

Question

I am reading JSON file that has some date columns. The issue is some of the date columns contain dates in Arabic/urdu text :

٠٤-٢٥-٢٠٢١

I want to convert it to English date in yyyy-mm-dd format. How to achieve this in Pyspark?

Mohana B C · Accepted Answer

You can convert arabic number to english by casting type to decimal.

df = spark.createDataFrame([('٠٤-٢٥-٢٠٢١',)],['arabic'])

df.withColumn('split', split('arabic', '-')) \
.withColumn('date', concat_ws('-', col('split')[2].cast('decimal'), col('split')[0].cast('decimal'), col('split')[1].cast('decimal'))) \
.drop('split').show()

+----------+---------+
|    arabic|     date|
+----------+---------+
|٠٤-٢٥-٢٠٢١ |2021-4-25|
+----------+---------+

Atif · Answer

Finally, I decided to use pandas_udf and python's unidecode library

from pyspark.sql.types import StringType
from pyspark.sql.functions import pandas_udf
from unidecode import unidecode
import pandas as pd

def unidecode_(val):
    if val:
        return unidecode(val)


@pandas_udf(StringType())
def a_to_n(col):
    return pd.Series(col.apply(unidecode_))

df = df_json.withColumn('checkin_date', a_to_n(F.col("checkin_date")))

It is giving me the desired answer.

How to parse datetime that is coming in Arabic text (٠٤-٢٥-٢٠٢١) to English dates in Pyspark

Tags:

python

apache-spark

pyspark

Atif

2 Answers

Mohana B C

Atif

Recent Activity

Donate For Us

How to parse datetime that is coming in Arabic text (٠٤-٢٥-٢٠٢١) to English dates in Pyspark

Tags:

python

apache-spark

pyspark

Atif

2 Answers

Mohana B C

Atif

Related questions

Recent Activity

Donate For Us