Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maven using local spark library

Due to a recent EKS update on AWS I was not anymore able to run spark jobs on AWS (kubernetes client version had to be upgraded). Therefore, I have been building the last Spark snapshot version (2.4.5-SNAPSHOT, it contains the bugfix I need) successfully. Now I want to add it to my project, replacing the old 2.3.3 version.

Unfortunately I get some compilation error (see below).

I am probably doing something wrong with my pom.xml file. The final goal is to fetch jar files from remote and from local (the repo)

Ideas? Thanks!

P.s. Ubuntu 18.04 + intellij

        The relevant part of the pom.xml file are the following:


        <?xml version="1.0" encoding="UTF-8"?>
        <project xmlns="http://maven.apache.org/POM/4.0.0"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>


    I add my local repo...

         <!-- My local repo where the jar file has been placed -->
            <repositories>
                <repository>
                    <id>Local</id>
                    <name>Repository Spark</name>
                    <url>/home/cristian/repository/sparkyspark/spark</url>
                </repository>
            </repositories>

        <groupId>sparkjob</groupId>
        <artifactId>sparkjob</artifactId>
        <version>1.0-SNAPSHOT</version>

        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <maven.compiler.source>1.8</maven.compiler.source>
            <maven.compiler.target>1.8</maven.compiler.target>
            <maven.test.skip>true</maven.test.skip>
        </properties>

        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>entry.Main</mainClass>
                            </manifest>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>

                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <!-- bind to the packaging phase -->
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-enforcer-plugin</artifactId>
                    <version>1.4.1</version>
                    <configuration>
                        <rules><dependencyConvergence/></rules>
                    </configuration>
                </plugin>
            </plugins>
        </build>



        ...

        <dependencies>
        .... 
        ....
    here it is, the jar file I need
         <!-- The last Spark jar file -->
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.4.5-SNAPSHOT</version>
                <exclusions>
                    <exclusion>
                        <groupId>com.fasterxml.jackson.core</groupId>
                        <artifactId>jackson-core</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
        ...
        ....
         </dependencies>

This is the error message, the path is correct...the file is there.
Ideas? :)

ERROR:

Could not resolve dependencies for project sparkjob:sparkjob:jar:1.0-SNAPSHOT: Failed to collect dependencies at org.apache.spark:spark-core_2.11:jar:2.4.5-SNAPSHOT: Failed to read artifact descriptor for org.apache.spark:spark-core_2.11:jar:2.4.5-SNAPSHOT: Could not transfer artifact org.apache.spark:spark-core_2.11:pom:2.4.5-SNAPSHOT from/to Local (/home/cristian/repository/sparkyspark/spark): Cannot access /home/cristian/repository/sparkyspark/spark with type default using the available connector factories.....

UPDATE: hard wiring the path seems to be a good work-around...

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.4.5-SNAPSHOT</version>
        <scope>system</scope>
        <systemPath>/home/cristian/repository/sparkyspark/spark/spark-core_2.11-2.4.5-SNAPSHOT.jar</systemPath>
        <exclusions>
            <exclusion>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>jackson-core</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
like image 631
Cristian Avatar asked Dec 28 '25 15:12

Cristian


1 Answers

If you want to use a folder as a repository you have to use file:// protocol.

So you repository config should be.

<repositories>
    <repository>
       <id>Local</id>
       <name>Repository Spark</name>
       <url>file:///home/cristian/repository/sparkyspark/spark</url>
    </repository>
</repositories>
like image 109
Karthikeyan Vaithilingam Avatar answered Dec 30 '25 13:12

Karthikeyan Vaithilingam