How to make HDFS work in docker swarm

Tags:

I have troubles to make my HDFS setup work in docker swarm. To understand the problem I've reduced my setup to the minimum :

1 physical machine
1 namenode
1 datanode

This setup is working fine with docker-compose, but it fails with docker-swarm, using the same compose file.

Here is the compose file :

version: '3'
services:
  namenode:
      image: uhopper/hadoop-namenode
      hostname: namenode
      ports:
        - "50070:50070"
        - "8020:8020"
      volumes:
        - /userdata/namenode:/hadoop/dfs/name
      environment:
        - CLUSTER_NAME=hadoop-cluster

  datanode:
    image: uhopper/hadoop-datanode
    depends_on:
      - namenode
    volumes:
      - /userdata/datanode:/hadoop/dfs/data
    environment:
      - CORE_CONF_fs_defaultFS=hdfs://namenode:8020

To test it, I have installed an hadoop client on my host (physical) machine with only this simple configuration in core-site.xml :

<configuration>
  <property><name>fs.defaultFS</name><value>hdfs://0.0.0.0:8020</value></property>
</configuration>

Then I run the following command :

hdfs dfs -put test.txt /test.txt

With docker-compose (just running docker-compose up) it's working and the file is written in HDFS.

With docker-swarm, I'm running :

docker swarm init 
docker stack deploy --compose-file docker-compose.yml hadoop

Then when all services are up, I put my file on HDFS it fails like this :

INFO hdfs.DataStreamer: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/x.x.x.x:50010]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1692)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1648)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
18/06/14 17:29:41 WARN hdfs.DataStreamer: Abandoning BP-1801474405-10.0.0.4-1528990089179:blk_1073741825_1001
18/06/14 17:29:41 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.0.0.6:50010,DS-d7d71735-7099-4aa9-8394-c9eccc325806,DISK]
18/06/14 17:29:41 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

If I look in the web UI the datanode seems to be up and no issue is reported...

Update : it seems that dependsOn is ignored by swarm, but it does not seem to be the cause of my problem : I've restarted the datanode when the namenode is up but it did not work better.

Thanks for your help :)

966

asked Jun 14 '18 15:06

Loic

1 Answers

The whole mess stems from interaction between docker swarm using overlay networks and how the HDFS name node keeps track of its data nodes. The namenode records the datanode IPs/hostnames based the datanode's overlay network IPs. When the HDFS client asks for read/write operations directly on the datanodes, the namenode reports back the IPs/hostnames of the datanodes based on the overlay network. Since the overlay network is not accessible to the external clients, any rw operations will fail.

The final solution (after lots of struggling to get overlay network to work) I used was to have the HDFS services use the host network. Here's a snippet from the compose file:

version: '3.7'

x-deploy_default: &deploy_default
  mode: replicated
  replicas: 1
  placement:
    constraints:
      - node.role == manager
  restart_policy:
    condition: any
    delay: 5s

services:
  hdfs_namenode:
    deploy:
      <<: *deploy_default
    networks:
      hostnet: {}
    volumes:
      - hdfs_namenode:/hadoop-3.2.0/var/name_node
    command:
      namenode -fs hdfs://${PRIMARY_HOST}:9000
    image: hadoop:3.2.0

  hdfs_datanode:
    deploy:
      mode: global
    networks:
      hostnet: {}
    volumes:
      - hdfs_datanode:/hadoop-3.2.0/var/data_node
    command:
      datanode -fs hdfs://${PRIMARY_HOST}:9000
    image: hadoop:3.2.0
volumes:
  hdfs_namenode:
  hdfs_datanode:

networks:
  hostnet:
    external: true
    name: host

187

answered Oct 02 '22 17:10

ftzeng12

Related questions
                            
                                Docker COPY and keep directory
                            
                                how to add s3 files to docker image
                            
                                docker-compose use environment variables from .env file
                            
                                Docker - Create new files as www-data and not root
                            
                                Artifactory: download github release using remote repository
                            
                                Why use label in docker-compose.yml, can't environment do the job?
                            
                                Can you start a process inside a Docker container as root, while having the default user of an exec call be non-root?
                            
                                Docker push seems not to update image => Layer already exists
                            
                                Docker - Cannot checkpoint container
                            
                                Setting up Docker with Knex.js and PostgreSQL
                            
                                .net output in Docker logs
                            
                                no suitable node - unable to deploy image using docker service
                            
                                Connecting a local Elixir/Erlang to a running application inside a Docker container
                            
                                Received below error while starting docker container " Error response from daemon: OCI runtime create failed: container_linux.go:348"
                            
                                Route/Bridge docker virtual adapter with zerotier virtual adapter
                            
                                Angular CLI & Docker: ng serve change detection very slow
                            
                                gcsfuse cannot mount when building a docker container
                            
                                Cannot run container in Docker for Windows: "System cannot find the file specified"
                            
                                Manifest not found on docker composer even if the tag exists
                            
                                Docker Jenkins Blue Ocean can not run ssh scp in Pipeline

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to make HDFS work in docker swarm

Tags:

docker

hadoop

hdfs

docker-swarm

Loic

People also ask

1 Answers

ftzeng12

Recent Activity

Donate For Us