I'm setting up Hadoop (0.20.2). For starters, I just want it to run on a single machine - I'll probably need a cluster at some point, but I'll worry about that when I get there. I got it to the point where my client code can connect to the job tracker and start jobs, but there's one problem: the job tracker is only accessible from the same machine that it's running on. I actually did a port scan with nmap, and it shows port 9001 open when scanning from the Hadoop machine, and closed when it's from somewhere else.
I tried this on three machines (one Mac, one Ubuntu, and an Ubuntu VM running in VirtualBox), it's the same. None of them have any firewalls set up, so I'm pretty sure it's a Hadoop problem. Any suggestions?
In your hadoop configuration files, does fs.default.name
and mapred.job.tracker
refer to localhost?
If so, then Hadoop will only listen to port 9000 and 9001 on the loopback interface, which is inaccessible from any other host. Make sure fs.default.name
and mapred.job.tracker
refer to your machine's externally accessible host name.
Make sure that you have not double listed your master in the /etc/hosts
file.
I had the following which only allowed master to listen on 127.0.1.1
127.0.1.1 hostname master
192.168.x.x hostname master
192.168.x.x slave-1
192.168.x.x slave-2
The above answer caused the problem. I changed my /ect/hosts
file to the following to make it work.
127.0.1.1 hostname
192.168.x.x hostname master
192.168.x.x slave-1
192.168.x.x slave-2
Use the command netstat -an | grep :9000
to verify your connections are working!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With