Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Dataframe change column value

I got some dataframe with 170 columns. In one column I have a "name" string and this string sometimes can have a special symbols like "'" that are not appropriate, when I am writing them to Postgres. Can I make something like that:

Df[$'name']=Df[$'name'].map(x => x.replaceAll("'","")) ?

I don't want to parse full DataFrame,because it's very huge.Help me please

like image 592
Mike Avatar asked Jan 18 '17 10:01

Mike


1 Answers

You can't mutate DataFrames, you can only transform them into new DataFrames with updated values. In this case - you can use the regex_replace function to perform the mapping on name column:

import org.apache.spark.sql.functions._
val updatedDf = Df.withColumn("name", regexp_replace(col("name"), ",", ""))
like image 150
Tzach Zohar Avatar answered Sep 29 '22 10:09

Tzach Zohar