Do we still have to make a fat jar for submitting jobs in Spark 2.0.0?

Question

In the Spark 2.0.0's release note, it says that:

Spark 2.0 no longer requires a fat assembly jar for production deployment.

Does this mean that we do not need to make a fat jar anymore for submitting jobs ?
If yes, how ? Thus the documentation here isn't up-to-date.

Yuval Itzchakov · Accepted Answer

Does this mean that we do not need to make a fat jar anymore for submitting jobs ?

Sadly, no. You still have to create an uber JARs for Sparks deployment.

The title from the release notes is very misleading. The actual meaning is that Spark itself as a dependency is no longer compiled into an uber JAR, but acts like a normal application JAR with dependencies. You can see this in more detail @ SPARK-11157 which is called "Allow Spark to be built without assemblies", and read the paper called "Replacing the Spark Assembly with good old jars" which describes the pros and cons of deploying Spark not as several huge JARs (Core, Streaming, SQL, etc..) but as a several relatively regular sized JARs containing the code and a lib/ directory with all the related dependencies.

If you really want the details, this pull request touches several key parts.

Donate For Us