I'm building a local HDFS dev environment (actually hadoop + mesos + zk + kafka) to ease development of Spark jobs and facilitate local integrated testing.
All other components are working fine but I'm having issues with HDFS. When the Data Node tries to connect to the name node, I get a DisallowedDataNodeException
:
org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode
Most questions related to the same issue boil down to name resolution of the data node at the name node either static through the etc/hosts
files or using dns. Static resolution is not an option with docker, as I don't know the data nodes when the name node container is created. I would like to avoid creating and maintaining an additional DNS service. Ideally, I would like to wire everything using the --link
feature of docker.
Is there a way to configure HDFS in such a way that it only uses IP addresses to work?
I found this property and set to false, but it didn't do the trick:
dfs.namenode.datanode.registration.ip-hostname-check
(default: true)
Is there a way to have a multi-node local HDFS cluster working only using IP addresses and without using DNS?
I would look at reconfiguring your Docker image to use a different hosts file [1]. In particular:
Hope this works for you!
[1] https://github.com/dotcloud/docker/issues/2267#issuecomment-40364340
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With