Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overwrite Databricks Dependency

In our project we're using com.typesafe:config in version 1.3.4. According to the latest release notes, this dependency is already provided by Databricks on the cluster, but in a very old version (1.2.1). How can I overwrite the provided dependency with our own version?

We use maven, in our dependencies I have

<dependency>
    <groupId>com.typesafe</groupId>
    <artifactId>config</artifactId>
    <version>1.3.4</version>
</dependency>

Our created jar file should therefore contain the newer version.

I created a Job by uploading the jar file. The Job fails because it can't find a method that was added after version 1.2.1, so it looks like the library we provided gets overwritten by the older version on the cluster.

like image 539
pgruetter Avatar asked Dec 19 '19 14:12

pgruetter


2 Answers

In the end we have fixed this by shading the relevant classes, by adding the following to our build.sbt

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.typesafe.config.**" -> "shadedSparkConfigForSpark.@1").inAll
)
like image 119
Oscar Bonilla Avatar answered Sep 28 '22 03:09

Oscar Bonilla


We solved it in the end by utilizing Sparks ChildFirstURLClassLoader. The project is open source so you can check it out yourself here and the usage of the method here.

But for reference, here is the method in its entirety. You need to provide a Seq of jars that you want to override with your own, in our case it's the typesafe config.

def getChildFirstClassLoader(jars: Seq[String]): ChildFirstURLClassLoader = {
  val initialLoader = getClass.getClassLoader.asInstanceOf[URLClassLoader]

  @tailrec
  def collectUrls(clazz: ClassLoader, acc: Map[String, URL]): Map[String, URL] = {

    val urlsAcc: Map[String, URL] = acc++
      // add urls on this level to accumulator
      clazz.asInstanceOf[URLClassLoader].getURLs
      .map( url => (url.getFile.split(Environment.defaultPathSeparator).last, url))
      .filter{ case (name, url) => jars.contains(name)}
      .toMap

    // check if any jars without URL are left
    val jarMissing = jars.exists(jar => urlsAcc.get(jar).isEmpty)
    // return accumulated if there is no parent left or no jars are missing anymore
    if (clazz.getParent == null || !jarMissing) urlsAcc else collectUrls(clazz.getParent, urlsAcc)
  }

  // search classpath hierarchy until all jars are found or we have reached the top
  val urlsMap = collectUrls(initialLoader, Map())

  // check if everything found
  val jarsNotFound = jars.filter( jar => urlsMap.get(jar).isEmpty)
  if (jarsNotFound.nonEmpty) {
    logger.info(s"""available jars are ${initialLoader.getURLs.mkString(", ")} (not including parent classpaths)""")
    throw ConfigurationException(s"""jars ${jarsNotFound.mkString(", ")} not found in parent class loaders classpath. Cannot initialize ChildFirstURLClassLoader.""")
  }
  // create child-first classloader
  new ChildFirstURLClassLoader(urlsMap.values.toArray, initialLoader)
}

As you can see, it also has some logic to abort if the jar files you specified do not exist in the classpath.

like image 32
pgruetter Avatar answered Sep 28 '22 03:09

pgruetter