Where is the reference for options for writing or reading per format?

Question

I use Spark 1.6.1.

We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use

df.write().orc(<path>)

we would rather do something like

df.write().options(Map("format" -> "orc", "path" -> "/some_path")

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here

https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameWriter.html#options(java.util.Map)

Jacek Laskowski · Accepted Answer

Where can we find a reference to the options that can be passed into the DataFrameWriter?

The most definitive and authoritative answer are the sources:

CSVOptions
JDBCOptions
JSONOptions
ParquetOptions
TextOptions
OrcOptions
...

Some description you may find in the docs, but there is no single page (that could possibly be auto-generated from the sources to stay up-to-date the most).

The reason being that the options are separated from the format implementation on purpose to have the flexibility you want to offer per use case (as you duly noted):

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library.

Your question seems similar to How to know the file formats supported by Databricks? where I said:

Where can I get the list of options supported for each file format?

That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.

Where is the reference for options for writing or reading per format?

Tags:

apache-spark

apache-spark-sql

apache-spark-1.6

Satyam

1 Answers

Jacek Laskowski

Recent Activity

Donate For Us

Where is the reference for options for writing or reading per format?

Tags:

apache-spark

apache-spark-sql

apache-spark-1.6

Satyam

1 Answers

Jacek Laskowski

Related questions

Recent Activity

Donate For Us