Using pyspark, how do I read multiple JSON documents on a single line in a file into a dataframe?

Question

Using Spark 2.3, I know I can read a file of JSON documents like this:

{'key': 'val1'}
{'key': 'val2'}

With this:

spark.json.read('filename')

How can I read the following in to a dataframe when there aren't newlines between JSON documents?

The following would be an example input.

{'key': 'val1'}{'key': 'val2'}

To be clear, I expect a dataframe with two rows (frame.count() == 2).

Tom Ron · Accepted Answer

Please try -

df = spark.read.json(["fileName1","fileName2"])

You can also do if you want to read all json files in the folder -

df = spark.read.json("data/*json")

Donate For Us