I have created an external table pointing to Azure ADLS with parquet storage and while inserting the data to that table I am getting the below error. I am using Databricks for the execution
org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
This was perfectly working fine yesterday and I have started getting this error from today.
I couldn't find any answer in the internet on why is this happenning.
If you want a workaround without cleaning up dependencies. Here is how you choose one of the sources (exemplified with "org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat"):
Replace:
spark.read.parquet("<path_to_parquet_file>")
With
spark.read.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").load("<path_to_parquet_file>")
You may have more than 1 jar file in spark/jars/ directory for example - spark-sql_2.12-2.4.4 and spark-sql_2.12-3.0.3 which may lead to multiple class issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With