Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column

Tags:

I have a json file:

{
  "a": {
    "b": 1
  }
}

I am trying to read it:

val path = "D:/playground/input.json"
val df = spark.read.json(path)
df.show()

But getting an error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column (named _corrupt_record by default). For example: spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count() and spark.read.schema(schema).json(file).select("_corrupt_record").show(). Instead, you can cache or save the parsed results and then send the same query. For example, val df = spark.read.schema(schema).json(file).cache() and then df.filter($"_corrupt_record".isNotNull).count().;

So I tried to cache it as they suggest:

val path = "D:/playground/input.json"
val df = spark.read.json(path).cache()
df.show()

But I keep getting the same error.

765

asked Aug 11 '19 16:08

Alon

2 Answers

You may try either of these two ways.

Option-1: JSON in single line as answered above by @Avishek Bhattacharya.

Option-2: Add option to read multi line JSON in the code as follows. You could read the nested attribute also as shown below.

val df = spark.read.option("multiline","true").json("C:\\data\\nested-data.json")
df.select("a.b").show()

Here is the output for Option-2.

20/07/29 23:14:35 INFO DAGScheduler: Job 1 finished: show at NestedJsonReader.scala:23, took 0.181579 s
+---+
|  b|
+---+
|  1|
+---+

123

answered Sep 18 '22 18:09

Kallol Bosu Roy Choudhuri

The problem is with the JSON file. The file : "D:/playground/input.json" looks like as you descibed as

{
  "a": {
  "b": 1
  }
}

This is not right. Spark while processing json data considers each new line as a complete json. Thus it is failing.

You should keep your complete json in a single line in a compact form by removing all white spaces and newlines.

{"a":{"b":1}}

If you want multiple jsons in a single file keep them like this

{"a":{"b":1}}
{"a":{"b":2}}
{"a":{"b":3}} ...

For more infos see

answered Sep 19 '22 18:09

Avishek Bhattacharya

Related questions
                            
                                JsonMappingException: could not initialize proxy - no Session
                            
                                Best way to save data in Unity game [closed]
                            
                                Cyrillic characters in PHP's json_encode
                            
                                RoR nested :include to include sub-resources in to_xml/to_json
                            
                                JSON.stringify an object with Knockout JS variables
                            
                                Rails Active Model Serializer - has_many and accessing the parent record
                            
                                Object.defineProperty on a prototype prevents JSON.stringify from serializing it
                            
                                Converting JSON into newline delimited JSON in Python
                            
                                Flatten aggregated key/value pairs from a JSONB field?
                            
                                What are SparkSession Config Options
                            
                                Deserializing JSON objects as List<type> not working with asmx service
                            
                                How to iterate all subnodes of a json object?
                            
                                interact with geojson layers independently in google maps api v3
                            
                                Type mismatch: inferred type is String but Charset was expected in kotlin
                            
                                RestSharp Deserialization with JSON Array
                            
                                Format APNS-style JSON message in Python for use with Amazon SNS
                            
                                Displaying JSON data in Chartjs
                            
                                How can I merge two JObject? [duplicate]
                            
                                How to convert XML to JSON using only Jackson?
                            
                                How to compress JSON with PHP?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column

Tags:

json

scala

apache-spark

apache-spark-sql

Alon

People also ask

2 Answers

Kallol Bosu Roy Choudhuri

Avishek Bhattacharya

Recent Activity

Donate For Us