I am trying to catch/ignore a parsing error when I'm reading a json file
val DF = sqlContext.jsonFile("file")
There are a couple of lines that aren't valid json objects, but the data is too large to go through individually (~1TB)
I've come across exception handling for mapping using import scala.util.Try
and in.map(a => Try(a.toInt))
referencing:
how to handle the Exception in spark map() function?
How would I catch an exception when reading a json file with the function sqlContext.jsonFile
?
Thanks!
Unfortunately you are out of luck here. DataFrameReader.json
which is used under the hood is pretty much all-or-nothing. If your input contains malformed lines you have to filter these manually. A basic solution could look like this:
import scala.util.parsing.json._
val df = sqlContext.read.json(
sc.textFile("file").filter(JSON.parseFull(_).isDefined)
)
Since above validation is rather expensive you may prefer to drop jsonFile
/ read.json
completely and to use parsed JSON lines directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With