It looks like this will error out
df.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.mode("overwrite")
.bucketBy(32,"column")
.sortBy("column")
.parquet("s3://....");
With error
Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at org.apache.spark.sql.DataFrameWriter.assertNotBucketed(DataFrameWriter.scala:314)
I see saveAsTable("myfile")
is still supported but it only writes locally. How would I take that saveAsTable(...)
output and put it on s3 after the job is done?
You Can use like below:
df
.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.option("path","s3://....")
.mode("overwrite")
.format("parquet")
.bucketBy(32,"column").sortBy("column")
.saveAsTable("tableName");
This will create a external table pointing to the S3 location .option("path","s3://....") is the catch here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With