I am pretty new to spark and would like to perform an operation on a column of a dataframe so as to replace all the , in the column with .
Assume there is a dataframe x and column x4
x4
1,3435
1,6566
-0,34435
I want the output to be as
x4
1.3435
1.6566
-0.34435
The code I am using is
import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)
But I get the following error
import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
       def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)
Any help on the syntax, logic or any other suitable way would be much appreciated
regexp_replace is a string function that is used to replace part of a string (substring) value with another string on DataFrame column by using gular expression (regex). This function returns a org. apache. spark.
Spark SQL function regex_replace can be used to remove special characters from a string column in Spark DataFrame.
Here's a reproducible example, assuming x4 is a string column.
import org.apache.spark.sql.functions.regexp_replace
val df = spark.createDataFrame(Seq(
  (1, "1,3435"),
  (2, "1,6566"),
  (3, "-0,34435"))).toDF("Id", "x4")
The syntax is regexp_replace(str, pattern, replacement), which translates to:
df.withColumn("x4New", regexp_replace(df("x4"), "\\,", ".")).show
+---+--------+--------+
| Id|      x4|   x4New|
+---+--------+--------+
|  1|  1,3435|  1.3435|
|  2|  1,6566|  1.6566|
|  3|-0,34435|-0.34435|
+---+--------+--------+
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With