Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specify hbase-site.xml to spark-submit

I have a spark job (written in Scala) that retrieves data from an HBase table found on another server. In order to do this I first create the HBaseContext like this: val hBaseContext:HBaseContext = new HBaseContext(sparkContext, HBaseConfiguration.create())

When I run the spark job I use spark-submit and specify the arguments needed. Something like this:

spark-submit  --master=local[*] --executor-memory 4g --executor-cores 2 --num-executors 2 --jars $(for x in `ls -1 ~/spark_libs/*.jar`; do readlink -f $x; done | paste -s | sed -e 's/\t/,/g') --class com.sparksJob.MyMainClass myJarFile.jar "$@"

The thing is that this connects to zookeeper on localhost, however I want it to connect to the zookeeper on another server (the one where HBase is).

Hardcoding this information works:

val configuration: Configuration = new Configuration()
configuration.set("hbase.zookeeper.quorum", "10.190.144.8")
configuration.set("hbase.zookeeper.property.clientPort", "2181")
val hBaseContext:HBaseContext = new HBaseContext(sparkContext, HBaseConfiguration.create(configuration))

However but I want it configurable.

How can I specify spark-submit the path to an hbase-site.xml file to use?

like image 255
Petre Popescu Avatar asked Oct 19 '22 01:10

Petre Popescu


1 Answers

You can pass hbase-site.xml as parameter of the --files option. Your example would become:

spark-submit  --master yarn-cluster --files /etc/hbase/conf/hbase-site.xml --executor-memory 4g --executor-cores 2 --num-executors 2 --jars $(for x in `ls -1 ~/spark_libs/*.jar`; do readlink -f $x; done | paste -s | sed -e 's/\t/,/g') --class com.sparksJob.MyMainClass myJarFile.jar "$@"

Notice the master set to yarn-cluster. Any other option would make the hbase-site.xml to be ignored.

like image 186
mgaido Avatar answered Nov 15 '22 07:11

mgaido