Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add extra metadata when writing to parquet files using spark

Looks like spark by default write "org.apache.spark.sql.parquet.row.metadata" to parquet file footer. However, what if I want to write some random metadata(such as version=123) to a parquet file produced by spark?

This does NOT work:

df.write().option("version","123").parquet("somefile.parquet");

And I'm using spark version 1.6.2

like image 330
xfj Avatar asked Sep 19 '25 01:09

xfj


1 Answers

Column level metadata, yes see my comment.

Table level comments/user metadata: See https://issues.apache.org/jira/browse/SPARK-10803

Sadly, not yet

like image 156
James Tobin Avatar answered Sep 21 '25 08:09

James Tobin