Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conflict between httpclient version and Apache Spark

I'm developing a Java application using Apache Spark. I use this version:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.2</version>
</dependency>

In my code, there is a transitional dependency:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version>
</dependency>

I package my application into a single JAR file. When deploying it on EC2 instance using spark-submit, I get this error.

Caused by: java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)

This error shows clearly that SparkSubmit has loaded an older version of the same Apache httpclient library and this conflict happens for this reason.

What is a good way to solve this issue?

For some reason, I cannot upgrade Spark on my Java code. However, I could do that with the EC2 cluster easily. Is it possible to deploy my java code on a cluster with a higher version say 1.6.1 version?

like image 915
Mohamed Taher Alrefaie Avatar asked Jun 15 '16 18:06

Mohamed Taher Alrefaie


1 Answers

As said in your post, Spark is loading an older version of the httpclient. The solution is to use the Maven's relocation facility to produce a neat conflict-free project.

Here's an example of how to use it in your pom.xml file :

<project>
  <!-- Your project definition here, with the groupId, artifactId, and it's dependencies --> 
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <relocations>
                <relocation>
                  <pattern>org.apache.http.client</pattern>
                  <shadedPattern>shaded.org.apache.http.client</shadedPattern>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

This will move all files from org.apache.http.client to shaded.org.apache.http.client, resolving the conflict.


Original post :

If this is simply a matter of transitive dependencies, you could just add this to your spark-core dependency to exclude the HttpClient used by Spark :

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.2</version>
    <scope>provided</scope>
    <exclusions>
        <exclusion>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
        </exclusion>
    </exclusions>
</dependency>

I also added the scope as provided in your dependency as it will be provided by your cluster.

However, that might muck around with Spark's internal behaviour. If you still get an error after doing this, you could try using Maven's relocation facility that should produce a neat conflict-free project.

Regarding the fact you can't upgrade Spark's version, did you use exactly this dependency declaration from mvnrepository ?

Spark being backwards compatible, there shouldn't be any problem deploying your job on a cluster with a higher version.

like image 133
Jonathan Taws Avatar answered Oct 01 '22 12:10

Jonathan Taws