Since Spring-Data-Hadoop is not released yet, it's hard to find a running example configuration to use it with cloudera.
Which Dependencies I need to choose to get a running Spring-Data-Hadoop together with CDH4 (Hadoop 2.0.0-cdh4.1.3)?
By choosing different apporches I got this Exceptions:
NullPointer
Exception in thread "SimpleAsyncTaskExecutor-1" java.lang.ExceptionInInitializerError
at org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
at org.springframework.util.ReflectionUtils.makeAccessible(ReflectionUtils.java:405)
at org.springframework.data.hadoop.mapreduce.JobUtils.<clinit>(JobUtils.java:123)
... 2 more
Version missmatch 7 to 4
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
at org.springframework.data.hadoop.mapreduce.JobFactoryBean.afterPropertiesSet(JobFactoryBean.java:208)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1545)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1483)
... 12 more
This is an example how to configure it.
Maven Setup:
Notes:
mvn dependency:tree
!Pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>com.example.main</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<java-version>1.7</java-version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<spring.version>3.2.0.RELEASE</spring.version>
<spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version>
<hadoop.version.generic>2.0.0-cdh4.1.3</hadoop.version.generic>
<hadoop.version.mr1>2.0.0-mr1-cdh4.1.3</hadoop.version.mr1>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>${spring.version}</version>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>${spring.hadoop.version}</version>
<exclusions>
<!-- Excluded the Hadoop dependencies to be sure that they are not mixed
with them provided by cloudera. -->
<exclusion>
<artifactId>hadoop-streaming</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-tools</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
</exclusions>
</dependency>
<!-- Hadoop Cloudera Dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version.generic}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version.generic}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-tools</artifactId>
<version>2.0.0-mr1-cdh4.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-streaming</artifactId>
<version>2.0.0-mr1-cdh4.1.3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>${java-version}</source>
<target>${java-version}</target>
</configuration>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>spring-milestones</id>
<url>http://repo.springsource.org/libs-milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>spring-snapshot</id>
<name>Spring Maven SNAPSHOT Repository</name>
<url>http://repo.springframework.org/snapshot</url>
</repository>
</repositories>
</project>
Spring Setup (applicationContext.xml):
Replace the fs.default.name
with your namenode domain
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/integration
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd">
<hdp:configuration id="hadoopConfiguration">
fs.default.name=hdfs://example.com:8020
</hdp:configuration>
<hdp:job id="wordCountJob"
mapper="com.example.WordMapper"
reducer="com.example.WordReducer"
input-path="/user/christian/input/test"
output-path="/user/christian/output2" />
<hdp:job-runner job-ref="wordCountJob" run-at-startup="true"
wait-for-completion="true" />
With this you should be able to access your cluster.
Some References:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With