Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark dataframe trim column and convert

In Scala / Spark, how to convert empty string, like " ", to "NULL" ? need to trim it first and then convert to "NULL". Thanks.

dataframe.na.replace("cut", Map(" " -> "NULL")).show //wrong
like image 366
user1615666 Avatar asked Nov 06 '16 03:11

user1615666


1 Answers

You can create a simple function to do it. First a couple of imports:

import org.apache.spark.sql.functions.{trim, length, when}
import org.apache.spark.sql.Column

and the definition:

def emptyToNull(c: Column) = when(length(trim(c)) > 0, c)

Finally a quick test:

val df = Seq(" ", "foo", "", "bar").toDF
df.withColumn("value", emptyToNull($"value"))

which should yield following result:

+-----+
|value|
+-----+
| null|
|  foo|
| null|
|  bar|
+-----+

If you want to replace empty string with string "NULL you can add otherwise clause:

def emptyToNullString(c: Column) = when(length(trim(c)) > 0, c).otherwise("NULL")
like image 192
zero323 Avatar answered Oct 26 '22 07:10

zero323