Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks - pyspark.pandas.Dataframe.to_excel does not recognize abfss protocol

I want to save a Dataframe (pyspark.pandas.Dataframe) as an Excel file on the Azure Data Lake Gen2 using Azure Databricks in Python. I've switched to the pyspark.pandas.Dataframe because it is the recommended one since Spark 3.2.

There's a method called to_excel (here the doc) that allows to save a file to a container in ADL but I'm facing problems with the file system access protocols. From the same class I use the methods to_csv and to_parquet using abfss and I'd like to use the same for the excel.

So when I try so save it using:

import pyspark.pandas as ps
# Omit the df initialization
file_name = "abfss://[email protected]/FILE.xlsx"
sheet = "test"
df.to_excel(file_name, test)

I get the error from fsspec:

ValueError: Protocol not known: abfss

Can someone please help me?

Thanks in advance!

like image 643
walzer91 Avatar asked Oct 18 '25 03:10

walzer91


2 Answers

Try using "abfs://" instead of "abfss://" - worked for me. See here for more info.

like image 71
Anton Eskov Avatar answered Oct 20 '25 17:10

Anton Eskov


The pandas dataframe does not support the protocol. It seems on Databricks you can only access and write the file on abfss via Spark dataframe. So, the solution is to write file locally and manually move to abfss. See this answer here.

like image 41
Phuri Chalermkiatsakul Avatar answered Oct 20 '25 18:10

Phuri Chalermkiatsakul