Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: what options can be passed with DataFrame.saveAsTable or DataFrameWriter.options?

Neither the developer nor the API documentation includes any reference about what options can be passed in DataFrame.saveAsTable or DataFrameWriter.options and they would affect the saving of a Hive table.

My hope is that in the answers to this question we can aggregate information that would be helpful to Spark developers who want more control over how Spark saves tables and, perhaps, provide a foundation for improving Spark's documentation.

like image 736
Sim Avatar asked Sep 05 '25 02:09

Sim


2 Answers

The reason you don't see options documented anywhere is that they are format-specific and developers can keep creating custom write formats with a new set of options.

However, for few supported formats I have listed the options as mentioned in the spark code itself:

  • CSVOptions
  • JDBCOptions
  • JSONOptions
  • ParquetOptions
  • TextOptions
  • OrcOptions
  • AvroOptions
like image 52
Ashvjit Singh Avatar answered Sep 07 '25 16:09

Ashvjit Singh


Take a look at the options file in GitHub the class "DeltaOptions'

Currently, supported options include:

  • replaceWhere
  • mergeSchema
  • overwriteSchema
  • userMetadata
  • partitionOverwriteMode (DYNAMIC, STATIC)
  • maxFilesPerTrigger
  • excludeRegex
  • ignoreFileDeletion
  • ignoreChanges
  • ignoreDeletes
  • skipChangeCommits
  • failOnDataLoss
  • optimizeWrite
  • dataChange
  • queryName
  • checkpointLocation
  • path
  • startingVersion
  • startingTimestamp
  • timestampAsOf
  • versionAsOf
like image 44
Alexander Zwitbaum Avatar answered Sep 07 '25 16:09

Alexander Zwitbaum