Databricks - pyspark.pandas.Dataframe.to_excel does not recognize abfss protocol

Question

I want to save a Dataframe (pyspark.pandas.Dataframe) as an Excel file on the Azure Data Lake Gen2 using Azure Databricks in Python. I've switched to the pyspark.pandas.Dataframe because it is the recommended one since Spark 3.2.

There's a method called to_excel (here the doc) that allows to save a file to a container in ADL but I'm facing problems with the file system access protocols. From the same class I use the methods to_csv and to_parquet using abfss and I'd like to use the same for the excel.

So when I try so save it using:

import pyspark.pandas as ps
# Omit the df initialization
file_name = "abfss://[email protected]/FILE.xlsx"
sheet = "test"
df.to_excel(file_name, test)

I get the error from fsspec:

ValueError: Protocol not known: abfss

Can someone please help me?

Thanks in advance!

Anton Eskov · Accepted Answer

Try using "abfs://" instead of "abfss://" - worked for me. See here for more info.

Phuri Chalermkiatsakul · Answer

The pandas dataframe does not support the protocol. It seems on Databricks you can only access and write the file on abfss via Spark dataframe. So, the solution is to write file locally and manually move to abfss. See this answer here.

Databricks - pyspark.pandas.Dataframe.to_excel does not recognize abfss protocol

Tags:

python

pandas

apache-spark

azure

azure-databricks

walzer91

2 Answers

Anton Eskov

Phuri Chalermkiatsakul

Recent Activity

Donate For Us

Databricks - pyspark.pandas.Dataframe.to_excel does not recognize abfss protocol

Tags:

python

pandas

apache-spark

azure

azure-databricks

walzer91

2 Answers

Anton Eskov

Phuri Chalermkiatsakul

Related questions

Recent Activity

Donate For Us