Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace all ":" with "_" in Spark dataframe [duplicate]

I'm trying to replace all instances of ":" --> "_" in a single column of a Spark dataframe. I'm trying to do this with:

val url_cleaner = (s:String) => {
   s.replaceAll(":","_")
}
val url_cleaner_udf = udf(url_cleaner)
val df = old_df.withColumn("newCol", url_cleaner_udf(old_df("oldCol")) )

But I keep getting the error:

 SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 692, ip-10-81-194-29.ec2.internal): java.lang.NullPointerException

Where am I going wrong in the udf?

like image 663
Feynman27 Avatar asked Sep 03 '16 16:09

Feynman27


1 Answers

Probably you've got some nulls in this column.

Try:

val urlCleaner = (s:String) => {
   if (s == null) null else s.replaceAll(":","_")
}

You can also use regexp_replace(col("newCol"), ":", "_") instead of own function

like image 110
T. Gawęda Avatar answered Sep 19 '22 00:09

T. Gawęda