I am trying to save a dataFrame using bucketBy
df.write.bucketBy("column").format("parquet").save()
But this producing the error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now;
Is there any other way to save the result of bucketBy?
Till now, spark 2.1, save
doesn't support bucketing as noted in the error message.
The method bucketBy
buckets the output by the given columns and when/if it's specified, the output is laid out on the file system similar to Hive's bucketing scheme.
There is a JIRA in progress working on Hive bucketing support [SPARK-19256].
So the only available operation after bucketing would be saveAsTable
which saves the content of the DataFrame
/Dataset
as the specified table.
And since mainly spark connects with hive so actually you are saving it to hive
.
So what you are actually isn't possible for the time being with spark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With