I am pretty new to spark and would like to perform an operation on a column of a dataframe so as to replace all the <code>,</code> in the column with <code>.</code> Assume there is a dataframe x and column x4 <pre class="prettyprint"><code>x4 1,3435 1,6566 -0,34435 </code></pre> I want the output to be as <pre class="prettyprint"><code>x4 1.3435 1.6566 -0.34435 </code></pre> The code I am using is <pre class="prettyprint"><code>import org.apache.spark.sql.Column def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4) </code></pre> But I get the following error <pre class="prettyprint"><code>import org.apache.spark.sql.Column <console>:1: error: ')' expected but '.' found. def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37) </code></pre> Any help on the syntax, logic or any other suitable way would be much appreciated

Here's a reproducible example, assuming <code>x4</code> is a string column. <pre class="prettyprint"><code>import org.apache.spark.sql.functions.regexp_replace val df = spark.createDataFrame(Seq( (1, "1,3435"), (2, "1,6566"), (3, "-0,34435"))).toDF("Id", "x4") </code></pre> The syntax is <code>regexp_replace(str, pattern, replacement)</code>, which translates to: <pre class="prettyprint"><code>df.withColumn("x4New", regexp_replace(df("x4"), "\\,", ".")).show +---+--------+--------+ | Id| x4| x4New| +---+--------+--------+ | 1| 1,3435| 1.3435| | 2| 1,6566| 1.6566| | 3|-0,34435|-0.34435| +---+--------+--------+ </code></pre>

how to use Regexp_replace in spark

Click to copy

I want the output to be as

Click to copy

The code I am using is

Click to copy

import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)

But I get the following error

Click to copy

import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
       def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)

Any help on the syntax, logic or any other suitable way would be much appreciated

976

asked Oct 17 '16 07:10

user3420819

1 Answers

Here's a reproducible example, assuming x4 is a string column.

Click to copy

import org.apache.spark.sql.functions.regexp_replace

val df = spark.createDataFrame(Seq(
  (1, "1,3435"),
  (2, "1,6566"),
  (3, "-0,34435"))).toDF("Id", "x4")

The syntax is regexp_replace(str, pattern, replacement), which translates to:

Click to copy

df.withColumn("x4New", regexp_replace(df("x4"), "\\,", ".")).show
+---+--------+--------+
| Id|      x4|   x4New|
+---+--------+--------+
|  1|  1,3435|  1.3435|
|  2|  1,6566|  1.6566|
|  3|-0,34435|-0.34435|
+---+--------+--------+

124

answered Sep 20 '22 19:09

mtoto

Related questions
                            
                                Scala Vector fold syntax (/: and :\ and /:\)
                            
                                How to Prevent CSRF in Play [2.0] Using Scala?
                            
                                Scala: map a Map to list of tuples
                            
                                Best way to handle false unused imports in intellij
                            
                                How do I get hold of exceptions thrown in a Scala Future?
                            
                                How to suppress info and success messages in sbt?
                            
                                How can I use primitives in Scala?
                            
                                Using generic case classes in Scala
                            
                                What effect does using Action.async have, since Play uses Netty which is non-blocking
                            
                                Scala Passing Function with Argument
                            
                                How to clone an iterator?
                            
                                scala: memoize a function no matter how many arguments the function takes?
                            
                                Coding with Scala implicits in style
                            
                                Maximum Length for scala queue
                            
                                Efficient string concatenation in Scala
                            
                                How to use constant value in UDF of Spark SQL(DataFrame)
                            
                                Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification
                            
                                How to join Datasets on multiple columns?
                            
                                Find all implicits
                            
                                May a while loop be used with yield in scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to use Regexp_replace in spark

Tags:

scala

apache-spark

apache-spark-sql

regexp-replace

user3420819

People also ask

1 Answers

mtoto

Recent Activity

Donate For Us