I was using the play JSON library tools to parse a JSON data in Spark, and got the following error message. Does anyone have any clue about possible cause of this error? If this is due to a bad JSON record, how can I identify the bad record? Thanks!
Here is the major script I used to parse the JSON data:
import play.api.libs.json._
val jsonData = distdata.map(line => Json.parse(line)) //line 194 of script parseJson_v14.scala
val filteredData = jsonData.map(json => (json \ "QueryStringParameters" \ "pr").asOpt[String].orNull).countByValue()
Variable distdata is a rdd of text format JSON data, variable jsonData is a rdd of JsValue data. Since Spark transformation is lazy, the error didn't jump out until the 2nd command is executed to create the variable filteredData, and according to the error message, the error comes from the the 1st command where I create the variable jsonData.
[2017-03-29 14:55:39.616]-[Logging$class.logWarning]-[WARN]: Lost task 42.0 in stage 1.0 (TID 90, 10.119.126.114): com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
at [Source: ; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3110)
at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3024)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:1652)
at play.api.libs.json.jackson.JacksonJson$.parseJsValue(JacksonJson.scala:226)
at play.api.libs.json.Json$.parse(Json.scala:21)
at parseJson_v14$$anonfun$1$$anonfun$3$$anonfun$apply$1.apply(parseJson_v14.scala:194)
at parseJson_v14$$anonfun$1$$anonfun$3$$anonfun$apply$1.apply(parseJson_v14.scala:194)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Check if you have no blank lines in distdata
and that you have all JSON object in one line, like
{"id":"121", "name":"robot 1"}
{"id":"122", "name":"robot 2"}
opposite to
{"id":"121", "name":
"robot 1"}
{"id":"122", "name":
"robot 2"}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With