In our project we're using com.typesafe:config in version 1.3.4. According to the latest release notes, this dependency is already provided by Databricks on the cluster, but in a very old version (1.2.1). How can I overwrite the provided dependency with our own version?
We use maven, in our dependencies I have
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.4</version>
</dependency>
Our created jar file should therefore contain the newer version.
I created a Job by uploading the jar file. The Job fails because it can't find a method that was added after version 1.2.1, so it looks like the library we provided gets overwritten by the older version on the cluster.
In the end we have fixed this by shading the relevant classes, by adding the following to our build.sbt
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.typesafe.config.**" -> "shadedSparkConfigForSpark.@1").inAll
)
We solved it in the end by utilizing Sparks ChildFirstURLClassLoader. The project is open source so you can check it out yourself here and the usage of the method here.
But for reference, here is the method in its entirety. You need to provide a Seq of jars that you want to override with your own, in our case it's the typesafe config.
def getChildFirstClassLoader(jars: Seq[String]): ChildFirstURLClassLoader = {
val initialLoader = getClass.getClassLoader.asInstanceOf[URLClassLoader]
@tailrec
def collectUrls(clazz: ClassLoader, acc: Map[String, URL]): Map[String, URL] = {
val urlsAcc: Map[String, URL] = acc++
// add urls on this level to accumulator
clazz.asInstanceOf[URLClassLoader].getURLs
.map( url => (url.getFile.split(Environment.defaultPathSeparator).last, url))
.filter{ case (name, url) => jars.contains(name)}
.toMap
// check if any jars without URL are left
val jarMissing = jars.exists(jar => urlsAcc.get(jar).isEmpty)
// return accumulated if there is no parent left or no jars are missing anymore
if (clazz.getParent == null || !jarMissing) urlsAcc else collectUrls(clazz.getParent, urlsAcc)
}
// search classpath hierarchy until all jars are found or we have reached the top
val urlsMap = collectUrls(initialLoader, Map())
// check if everything found
val jarsNotFound = jars.filter( jar => urlsMap.get(jar).isEmpty)
if (jarsNotFound.nonEmpty) {
logger.info(s"""available jars are ${initialLoader.getURLs.mkString(", ")} (not including parent classpaths)""")
throw ConfigurationException(s"""jars ${jarsNotFound.mkString(", ")} not found in parent class loaders classpath. Cannot initialize ChildFirstURLClassLoader.""")
}
// create child-first classloader
new ChildFirstURLClassLoader(urlsMap.values.toArray, initialLoader)
}
As you can see, it also has some logic to abort if the jar files you specified do not exist in the classpath.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With