Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Shell unable to find the Hbase Class

I am trying to load data from HDFS to a Hbase table using Spark Streaming. I am placing data in a HDFS directory run time and reading it using the textFileStream function. Since spark doesnt have the hbase jars in the classpath, it gives me an error even while doing the import of the Hbase jars in the spark shell.

scala> import org.apache.hadoop.hbase.mapred.TableOutputFormat
<console>:10: error: object hbase is not a member of package org.apache.hadoop
       import org.apache.hadoop.hbase.mapred.TableOutputFormat

But if i add the hbase jars in the classpath while starting the spark shell then i am not getting any error. But it is still not able to find certain classes down the path.

bin/spark-shell --jars /hbase/hbase-0.94.13/hbase-0.94.13-mapr-1401.jar

scala> import org.apache.hadoop.hbase.{ HBaseConfiguration, HColumnDescriptor, HTableDescriptor }
import org.apache.hadoop.hbase.{HBaseConfiguration, HColumnDescriptor, HTableDescriptor}

scala> import org.apache.hadoop.hbase.client.{ HBaseAdmin, Put }
import org.apache.hadoop.hbase.client.{HBaseAdmin, Put}

scala> import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.io.ImmutableBytesWritable

scala> import org.apache.hadoop.hbase.mapred.TableOutputFormat
import org.apache.hadoop.hbase.mapred.TableOutputFormat

scala> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.mapreduce.TableInputFormat

scala> import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.util.Bytes

scala> import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapred.JobConf

scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext

scala> import org.apache.spark.rdd.{ PairRDDFunctions, RDD }
import org.apache.spark.rdd.{PairRDDFunctions, RDD}

scala> import org.apache.spark.streaming._
import org.apache.spark.streaming._

scala> import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.StreamingContext._

scala> import org.apache.hadoop.hbase.client.mapr.{BaseTableMappingRules}
import org.apache.hadoop.hbase.client.mapr.BaseTableMappingRules

scala> val conf = HBaseConfiguration.create()
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, hbase-default.xml, hbase-site.xml

scala> val hbaseTableName = "/app/dev/MarketingIt/hbasetables/spark_test"
hbaseTableName: String = /app/dev/MarketingIt/hbasetables/spark_test

scala> val admin = new HBaseAdmin(conf)
java.lang.RuntimeException: java.io.IOException: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.MapRTableMappingRules.
==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules.
        at org.apache.hadoop.hbase.client.HBaseAdmin.commonInit(HBaseAdmin.java:356)
        at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:156)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
        at $iwC$$iwC$$iwC.<init>(<console>:48)
        at $iwC$$iwC.<init>(<console>:50)
        at $iwC.<init>(<console>:52)
        at <init>(<console>:54)
        at .<init>(<console>:58)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
        at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.MapRTableMappingRules.
==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules.
        at org.apache.hadoop.hbase.client.mapr.TableMappingRulesFactory.create(TableMappingRulesFactory.java:65)
        at org.apache.hadoop.hbase.client.HBaseAdmin.commonInit(HBaseAdmin.java:348)
        ... 47 more
Caused by: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.MapRTableMappingRules.
==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules.
        at org.apache.hadoop.hbase.client.mapr.GenericHFactory.getImplementorInstance(GenericHFactory.java:40)
        at org.apache.hadoop.hbase.client.mapr.TableMappingRulesFactory.create(TableMappingRulesFactory.java:47)
        ... 48 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.hadoop.hbase.client.mapr.GenericHFactory.getImplementorInstance(GenericHFactory.java:30)
        ... 49 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.mapr.BaseTableMappingRules
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 65 more

Here  as you can see i have added all the hbase jars and spark is able to find some of the hbase classes and cant find some
All the classes are in the same jar i added. Since it is saying Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.mapr.BaseTableMappingRules. I have imported that class specifically. But i still get the same error.
like image 865
user2135730 Avatar asked Nov 02 '14 19:11

user2135730


2 Answers

If you are using Spark 1.+, try setting extra Class Path property in Spark configuration

Add this line to spark-defaults.conf --

spark.executor.extraClassPath /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar

If you are using another distribution, lookup appropriate path for the jar files.

In addition to the configuration change, add driver class path to spark shell or while submitting your spark job as --

--driver-class-path
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar

You can add the jar files to spark class path in spark-env.sh to avoid specifying the full path every time you want to start spark-shell or submit a spark job, but I run into other issues while using this approach. I have found the options above to work better for me.

export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar

Nothing more is needed for Spark 1.+

If you are using Spark 0.9, please see this link -- yes, links can break but I have not tested on Spark 0.9 and this blog has useful information http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html

like image 165
charmee Avatar answered Nov 19 '22 13:11

charmee


Add line in conf/spark-env.sh

Replace ${HBASE_HOME} with Full path

export SPARK_CLASSPATH=${HBASE_HOME}/lib/*

like image 32
ankursingh1000 Avatar answered Nov 19 '22 12:11

ankursingh1000