Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which Spark version should I download to run on top of Hadoop 3.1.2?

In Spark download page we can choose between releases 3.0.0-preview and 2.4.4.

For release 3.0.0-preview there are the package types

  • Pre-built for Apache Hadoop 2.7
  • Pre-built for Apache Hadoop 3.2 and later
  • Pre-built with user-provided Apache Hadoop
  • Source code

For release 2.4.4 there are the package types

  • Pre-built for Apache Hadoop 2.7
  • Pre-built for Apache Hadoop 2.6
  • Pre-built with user-provided Apache Hadoop
  • Pre-built with Scala 2.12 and user-provided Apache Hadoop
  • Source code

Since there isn't a Pre-built for Apache Hadoop 3.1.2 option, can I download a Pre-built with user-provided Apache Hadoop package or should I download Source code?

like image 360
Henrique Andrade Avatar asked Dec 05 '25 03:12

Henrique Andrade


1 Answers

If you are comfortable building source code, then that is your best option.

Otherwise, you already have a Hadoop cluster, so pick "user-provided" and copy your relevant core-site.xml, hive-site.xml, yarn-site.xml, and hdfs-site.xml all into the $SPARK_CONF_DIR, and it hopefully mostly will work

Note: DataFrames don't work on Hadoop 3 until Spark 3.x - SPARK-18673

like image 148
OneCricketeer Avatar answered Dec 08 '25 18:12

OneCricketeer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!