How to saveAsTable to s3?

Question

It looks like this will error out

df.write()
  .option("mode", "DROPMALFORMED")
  .option("compression", "snappy")
  .mode("overwrite")
  .bucketBy(32,"column")
  .sortBy("column")
  .parquet("s3://....");

With error

Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at org.apache.spark.sql.DataFrameWriter.assertNotBucketed(DataFrameWriter.scala:314)

I see saveAsTable("myfile") is still supported but it only writes locally. How would I take that saveAsTable(...) output and put it on s3 after the job is done?

Amit Kumar · Accepted Answer

    You Can use like below:

    df
                .write()
                .option("mode", "DROPMALFORMED")
                .option("compression", "snappy")
                .option("path","s3://....")
                .mode("overwrite")
                .format("parquet")
                .bucketBy(32,"column").sortBy("column")
                .saveAsTable("tableName");

This will create a external table pointing to the S3 location .option("path","s3://....") is the catch here

How to saveAsTable to s3?

Tags:

apache-spark

apache-spark-sql

amazon-emr

ForeverConfused

1 Answers

Amit Kumar

Recent Activity

Donate For Us

How to saveAsTable to s3?

Tags:

apache-spark

apache-spark-sql

amazon-emr

ForeverConfused

1 Answers

Amit Kumar

Related questions

Recent Activity

Donate For Us