Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks/Spark read custom metadata from Parquet file

I created a Parquet file with custom metadata at file level: data = spark.read.option("mergeSchema", "true").parquet(path)

Now I'm trying to read that metadata from the Parquet file in (Azure) Databricks. But when I run the following code I don't get any metadata which is present there.

storageaccount = 'zzzzzz'
containername = 'yyyyy'
access_key = 'xxxx'
spark.conf.set(f'fs.azure.account.key.{storageaccount}.blob.core.windows.net', access_key)

path = f"wasbs://{containername}@{storageaccount}.blob.core.windows.net/generated_example_10m.parquet"
data = spark.read.format('parquet').load(path)
print(data.printSchema())
like image 202
Korenaga Avatar asked Nov 23 '25 15:11

Korenaga


1 Answers

I try to reproduce same thing in my environment. I got this output.

Please follow below code and Use select("*", "_metadata")

path = "wasbs://<container>@<storage_account_name>.blob.core.windows.net/<file_path>.parquet"
data = spark.read.format('parquet').load(path).select("*", "_metadata")
display(data)

or

Mention your schema and load path with .select("*", "_metadata")

df = spark.read \
  .format("parquet") \
  .schema(schema) \
  .load(path) \
  .select("*", "_metadata")

display(df)

enter image description here

like image 192
Vamsi Bitra Avatar answered Nov 25 '25 10:11

Vamsi Bitra



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!