I have loaded an excel file from S3 using the below syntax, but I am wondering about the options that need to be set here.
Why is it mandatory to set all the below options for loading excel file? None of these options are mandatory for loading other file types like csv,del,json,avro etc.
val data = sqlContext.read.
format("com.crealytics.spark.excel").
option("location", s3path).
option("useHeader", "true").
option("treatEmptyValuesAsNulls", "true").
option("inferSchema","true").
option("addColorColumns", "true").
load(path)
I get the below error if any of the above options(except location) are not set:
sqlContext.read.format("com.crealytics.spark.excel").option("location", s3path).load(s3path)
Error message :
Name: java.lang.IllegalArgumentException
Message: Parameter "useHeader" is missing in options.
StackTrace: at com.crealytics.spark.excel.DefaultSource.checkParameter(DefaultSource.scala:37)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:19)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:7)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
at $anonfun$1.apply(<console>:47)
at $anonfun$1.apply(<console>:47)
at time(<console>:36)
Most of the options for spark-excel
are mandatory except for userSchema
and sheetName
.
You can always check for that in the DataSource source code that you can find here.
You have to remember that this data source or data connector packages are implemented outside of the spark project and each comes with his rules and parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With