How to save bucketed DataFrame?

Question

I am trying to save a dataFrame using bucketBy

df.write.bucketBy("column").format("parquet").save()

But this producing the error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now;

Is there any other way to save the result of bucketBy?

eliasah · Accepted Answer

Till now, spark 2.1, save doesn't support bucketing as noted in the error message.

The method bucketBy buckets the output by the given columns and when/if it's specified, the output is laid out on the file system similar to Hive's bucketing scheme.

There is a JIRA in progress working on Hive bucketing support [SPARK-19256].

So the only available operation after bucketing would be saveAsTable which saves the content of the DataFrame/Dataset as the specified table.

And since mainly spark connects with hive so actually you are saving it to hive.

So what you are actually isn't possible for the time being with spark.

How to save bucketed DataFrame?

Tags:

apache-spark

apache-spark-sql

syl

1 Answers

eliasah

Recent Activity

Donate For Us

How to save bucketed DataFrame?

Tags:

apache-spark

apache-spark-sql

syl

1 Answers

eliasah

Related questions

Recent Activity

Donate For Us