Passing two columns to a udf in scala?

Tags:

I have a dataframe containing two columns,one is data and the other column is character count in that data field.

Data    Count
Hello   5
How     3
World   5

I want to change value of column data based on the value in count column. How can this be achieved? I tried this using an udf :

invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("value"),invalidrecords("a_cnt")))

This seems to fail, is this the correct way to do it?

766

asked Jul 07 '17 12:07

Osy

1 Answers

Here's an easy way of doing it

first you create a dataframe

import sqlContext.implicits._
val invalidrecords = Seq(
  ("Hello", 5),
  ("How", 3),
  ("World", 5)
).toDF("Data", "Count")

you should have

+-----+-----+
|Data |Count|
+-----+-----+
|Hello|5    |
|How  |3    |
|World|5    |
+-----+-----+

Then you define udf function as

import org.apache.spark.sql.functions._
def appendDelimiterError = udf((data: String, count: Int) => "value with error" )

And you call using withColumn as

invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)

You should have output as

+-----+-----+----------------+
|Data |Count|value           |
+-----+-----+----------------+
|Hello|5    |value with error|
|How  |3    |value with error|
|World|5    |value with error|
+-----+-----+----------------+

You can write your logic instead of returning a string from udf function

Edited

Answering your requirements in the comment below would require you to change the udf function and withColumn as below

def appendDelimiterError = udf((data: String, count: Int) => {
  if(count < 5) s"convert value to ${data} - error"
  else data
} )

invalidrecords.withColumn("Data",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)

you should have output as

+----------------------------+-----+
|Data                        |Count|
+----------------------------+-----+
|Hello                       |5    |
|convert value to How - error|3    |
|World                       |5    |
+----------------------------+-----+

115

answered Oct 22 '22 11:10

Ramesh Maharjan

Related questions
                            
                                Slick error while compiling table definitions: could not find implicit value for parameter tm
                            
                                Scala Stdin.readLine() does not seem to work as expected
                            
                                Convert scala.List[scala.Long] to List<java.util.Long>
                            
                                Scala how to sum a list of futures
                            
                                Unbound Wildcard Type
                            
                                RDD to LabeledPoint conversion
                            
                                Scala types: Class A is not equal to the T where T is: type T = A
                            
                                Find size of data stored in rdd from a text file in apache spark
                            
                                Scala: Get sum of nth element from tuple array/RDD
                            
                                Smartly deal with Option[T] in Scala
                            
                                Scala require() equivalent in Kotlin
                            
                                How to use ConcurrentHashMap computeIfAbsent() in Scala
                            
                                Scala flatMap, what are ms and e?
                            
                                "No Manifest available for Type" error
                            
                                How to call superclass constructor from child class in scala and how to do constructor chaining
                            
                                How to extract values from json string?
                            
                                andThen in List scala
                            
                                How to get substring in scala?
                            
                                Map column values to a a numeric type in spark
                            
                                I can't understand 'RDD.map{ case (A, B) => A } ' in Scala Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Passing two columns to a udf in scala?

Tags:

scala

apache-spark

user-defined-functions

Osy

People also ask

1 Answers

Ramesh Maharjan

Recent Activity

Donate For Us