I have an aws ec2 cluster setup by the spark-ec2 script.
I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.
I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.
I installed Zeppelin locally with
mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1
My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.
I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]
When I test out
sc
in the interpreter, I recieve this error
java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at
When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.
NOTE: I eventually would also want to access my s3 buckets with something like:
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first
if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!
Most likely your IP address is blocked from connecting to your spark cluster. You can try by launching the spark-shell pointing at that end point (or even just telnetting). To fix it you can log into your AWS account and change the firewall settings. Its also possible that it isn't pointed at the correct host (I'm assuming you removed the specific box from spark://.us-west-2.compute.amazonaws.com:7077
but if not there should be a bit for the .us-west-2). You can try ssh'ing to that machine and running netstat --tcp -l -n to see if its listening (or even just ps aux |grep java to see if Spark is running).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With