I am new to Apache Spark 1.3.1. How can I convert a JSON file to Parquet?
Yes, we can convert the CSV/JSON files to Parquet using AWS Glue. But this is not only the use case. You can convert to the below formats.
The JSON file is converted to CSV file using "dataframe. write. csv("path")" function. The JSON file is converted to Parquet file using the "spark.
Spark SQL can automatically infer the schema of a JSON dataset and load it as a Dataset<Row> . This conversion can be done using SparkSession. read(). json() on either a Dataset<String> , or a JSON file.
Spark 1.4 and later
You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.
val df = sqlContext.read.json("path/to/json/file")
df.write.parquet("path/to/parquet/file")
or
df.save("path/to/parquet/file", "parquet")
Check here and here for examples and more details.
Spark 1.3.1
val df = sqlContext.jsonFile("path/to/json/file")
df.saveAsParquetFile("path/to/parquet/file")
Issue related to Windows and Spark 1.3.1
Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException
, as described here.
In that case, please consider to upgrade to a more recent Spark version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With