Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connect to spark through a SOCKS proxy

TL;DR How can I connect a local driver to a spark cluster through a SOCKS-proxy.

We have an onsite spark cluster that is behind a firewall that blocks most ports. We have ssh access, so I can create a SOCKS proxy with ssh -D 7777 ....

It works fine for browsing the web-UI's when my browser uses the proxy, but I do not know how to make a local driver use the it.

So far I have this, which obviously is not configuring any proxies:

val sconf = new SparkConf()
  .setMaster("spark://masterserver:7077")
  .setAppName("MySpark")
new SparkContext(sconf)

Which logs these messages 16 times before throwing an exception.

15/01/20 14:43:34 INFO Remoting: Starting remoting
15/01/20 14:43:34 ERROR NettyTransport: failed to bind to server-name/ip.ip.ip.ip:0, shutting down Netty transport
15/01/20 14:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/01/20 14:43:34 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
15/01/20 14:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/01/20 14:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
like image 625
Tobber Avatar asked Jan 20 '15 13:01

Tobber


People also ask

Does SOCKS5 work with proxy?

Unlike HTTP proxies, which can only interpret and work with HTTP and HTTPS webpages, SOCKS5 proxies can work with any traffic. HTTP proxies are high-level proxies usually designed for a specific protocol. While this means you get better connection speeds, they're not nearly as flexible and secure as SOCKS proxies.

How do I open SOCKS5 proxy?

Click the Apple icon at the top left of the menu bar on your screen and select System Preferences. Select Network and then Proxies. Click the Advanced button to access the Network settings and navigate to the Proxies tab. Click the SOCKS Proxy checkbox and enter the host and port information.

What is SOCKS5 proxy port?

A SOCKs5 proxy is a lightweight, general-purpose proxy that sits at layer 5 of the OSI model and uses a tunneling method. It supports various types of traffic generated by protocols, such as HTTP, SMTP and FTP. SOCKs5 is faster than a VPN and easy to use.


1 Answers

Your best shot may be to forward a local port to remote 7077, and then setMaster("spark://localhost:nnnn") where nnnn is the local port you have forwarded.

To do this use ssh -L (instead of -D). I cannot guarantee that this will work, or if it works, that it will continue to work, but at least it will spare you using an actual proxy for this one port. Things that might break it, are mostly secondary connections that the initial connection might trigger. I didn't test this yet, but unless there are secondary connections, in principle it should work.

Also, this doesn't answer the TL;DR-version of your question, but since you have SSH-access, it's more likely to work.

like image 135
Rick Moritz Avatar answered Oct 29 '22 21:10

Rick Moritz