I have a dataframe which looks like this:
+------------------------+----------+
|Postal code |PostalCode|
+------------------------+----------+
|Muxía |null |
|Fuensanta |null |
|Salobre |null |
|Bolulla |null |
|33004 |null |
|Santa Eulàlia de Ronçana|null |
|Cabañes de Esgueva |null |
|Vallarta de Bureba |null |
|Villaverde del Monte |null |
|Villaluenga del Rosario |null |
+------------------------+----------+
If the Postal code column contains only numbers, I want to create a new column where only numerical postal codes are stored. If the postal code column contains only text, want to create an new column called 'Municipality'.
I tried to use 'isnan' as my understanding this will check if a value is not a number, but this does not seem to work. Should the column type be string for this to work or?
So far my attempt are:
> df2 = df.withColumn('PostalCode', when(isnan(df['Postal code']), df['Postal code'])
Looking at the dataframe results example posted above, you can see all values 'Null' are returned for new column, also for postal code '33004'
Any ideas will be much appreciated
isnan
only returns true
if the column contains an mathematically invalid number, for example 5/0. In any other case, including strings, it will return false
. If you want to check if a column contains a numerical value, you need to define your own udf
, for example as shown below:
from pyspark.sql.functions import when,udf
from pyspark.sql.types import BooleanType
df = spark.createDataFrame([('33004', ''),('Muxia', None), ('Fuensanta', None)], ("Postal code", "PostalCode"))
def is_digit(value):
if value:
return value.isdigit()
else:
return False
is_digit_udf = udf(is_digit, BooleanType())
df = df.withColumn('PostalCode', when(is_digit_udf(df['Postal code']), df['Postal code']))
df = df.withColumn('Municipality', when(~is_digit_udf(df['Postal code']), df['Postal code']))
df.show()
This gives as output:
+-----------+----------+------------+
|Postal code|PostalCode|Municipality|
+-----------+----------+------------+
| 33004| 33004| null|
| Muxia| null| Muxia|
| Fuensanta| null| Fuensanta|
+-----------+----------+------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With