Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error while read or write Parquet format data

I have created an external table pointing to Azure ADLS with parquet storage and while inserting the data to that table I am getting the below error. I am using Databricks for the execution

org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;

This was perfectly working fine yesterday and I have started getting this error from today.

I couldn't find any answer in the internet on why is this happenning.

like image 260
Sathya Avatar asked May 12 '26 07:05

Sathya


2 Answers

If you want a workaround without cleaning up dependencies. Here is how you choose one of the sources (exemplified with "org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat"):

Replace:

spark.read.parquet("<path_to_parquet_file>")

With

spark.read.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").load("<path_to_parquet_file>")

like image 148
Stian Avatar answered May 14 '26 09:05

Stian


You may have more than 1 jar file in spark/jars/ directory for example - spark-sql_2.12-2.4.4 and spark-sql_2.12-3.0.3 which may lead to multiple class issue.

like image 22
Moksha Avatar answered May 14 '26 10:05

Moksha



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!