Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Avro in Azure HDI4.0

I'm trying to read an Avro file using Jupyter notebook in Azure HDInsight 4.0 with Spark 2.4. I'm not able to provide properly the .jar file to

I've tried the approach suggested in How to use Avro on HDInsight Spark/Jupyter? and in https://learn.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages but I guess they are related to Spark 2.3

%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}

This produce the error message:

pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'

like image 446
MDP89 Avatar asked Nov 18 '25 03:11

MDP89


1 Answers

The solution that seem to work is

%%configure -f 
{ "conf": {"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.0" }}
like image 118
MDP89 Avatar answered Nov 19 '25 21:11

MDP89