My json file looks like below, it has got two multiline json objects (in a single file)
{
"name":"John Doe",
"id":"123456"
}
{
"name":"Jane Doe",
"id":"456789"
}
So when i load multiline json dataframe it should load two json instead it is loading first json object only. How can i load all the multiline json objects in a single file?
val rawData = spark.read.option("multiline", true).option("mode", "PERMISSIVE").format("json").load("/tmp/search/baggage/test/1")
scala> rawData.show
+------+--------+
| id| name|
+------+--------+
|123456|John Doe|
+------+--------+
scala> rawData.count
res20: Long = 1
Your input JSON is not valid, it misses brackets as you have multiples objects. You can check this using any json validator tool. That's why multiLine option won't work in this case.
That said, I think you want to use JsonLines format where each line represents a JSON object.
{"name":"John Doe","id":"123456"}
{"name":"Jane Doe","id":"456789"}
Spark can read this JSON without setting multiline option:
val df = spark.read.json("file:///your/json/file.json")
df.show()
Output :
+------+--------+
| id| name|
+------+--------+
|123456|John Doe|
|456789|Jane Doe|
+------+--------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With