I want to ignore Exception in map() function , for example:
rdd.map(_.toInt)
where rdd is a RDD[String]
.
but if it meets non-number string, it will failed.
what is the easist way to ignore any Exception and ignore that line? (I do not want to use filter to handle exception, because there may be so many other exceptions...)
Delta Lake with Apache Spark using Scala When you want to handle exceptions, you use a try{...} catch{...} block like you would in Java except that the catch block uses matching to identify and handle the exceptions.
Spark map() transformation applies a function to each row in a DataFrame/Dataset and returns the new transformed Dataset. As mentioned earlier, map() returns one row for every row in an input DataFrame. In other words, input and the result exactly contain the same number of rows.
mapPartitions() – This is exactly the same as map(); the difference being, Spark mapPartitions() provides a facility to do heavy initializations (for example Database connection) once for each partition instead of doing it on every DataFrame row.
Exception handling is the mechanism to respond to the occurrence of an exception. Exceptions can be checked or unchecked. Scala only allows unchecked exceptions, though. This means that, at compile-time, we won't be able to know if a method is throwing an exception we are not handling.
You can use a combination of Try and map/filter.
Try will wrap your computation into Success, if they behave as expected, or Failure, if an exception is thrown. Then you can filter what you want - in this case the successful computations, but you could also filter the error cases for logging purposes, for example.
The following code is a possible starting point. You can run and explore it in scastie.org to see if it fits your needs.
import scala.util.Try
object Main extends App {
val in = List("1", "2", "3", "abc")
val out1 = in.map(a => Try(a.toInt))
val results = out1.filter(_.isSuccess).map(_.get)
println(results)
}
I recommend you to use filter/map
rdd.filter(r=>NumberUtils.isNumber(r)).map(r=> r.toInt)
or flatmap
exampleRDD.flatMap(r=> {if (NumberUtils.isNumber(r)) Some(r.toInt) else None})
Otherwise you can catch exception in map function
myRDD.map(r => { try{
r.toInt
}catch {
case runtime: RuntimeException => {
-1
}
}
})
and then apply filter(on -1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With