I have a spark job (written in Scala) that retrieves data from an HBase table found on another server. In order to do this I first create the HBaseContext
like this:
val hBaseContext:HBaseContext = new HBaseContext(sparkContext, HBaseConfiguration.create())
When I run the spark job I use spark-submit and specify the arguments needed. Something like this:
spark-submit --master=local[*] --executor-memory 4g --executor-cores 2 --num-executors 2 --jars $(for x in `ls -1 ~/spark_libs/*.jar`; do readlink -f $x; done | paste -s | sed -e 's/\t/,/g') --class com.sparksJob.MyMainClass myJarFile.jar "$@"
The thing is that this connects to zookeeper on localhost, however I want it to connect to the zookeeper on another server (the one where HBase is).
Hardcoding this information works:
val configuration: Configuration = new Configuration()
configuration.set("hbase.zookeeper.quorum", "10.190.144.8")
configuration.set("hbase.zookeeper.property.clientPort", "2181")
val hBaseContext:HBaseContext = new HBaseContext(sparkContext, HBaseConfiguration.create(configuration))
However but I want it configurable.
How can I specify spark-submit the path to an hbase-site.xml file to use?
You can pass hbase-site.xml as parameter of the --files option. Your example would become:
spark-submit --master yarn-cluster --files /etc/hbase/conf/hbase-site.xml --executor-memory 4g --executor-cores 2 --num-executors 2 --jars $(for x in `ls -1 ~/spark_libs/*.jar`; do readlink -f $x; done | paste -s | sed -e 's/\t/,/g') --class com.sparksJob.MyMainClass myJarFile.jar "$@"
Notice the master set to yarn-cluster. Any other option would make the hbase-site.xml to be ignored.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With