Does anyone where's the official docker images for Hadoop, e.g. YARN, HDFS? I'd like to use them within a docker image.
Apache Hadoop is a core big data technology. Running Hadoop on Docker is a great way to get up and running quickly. Below are the basic steps to create a simple Hadoop Docker image.
Hadoop daemons must be containerized to enable immutable and repeatable deployments. Cluster operations must be modeled using declarative concepts (instead of action-based imperative models) Any host in the cluster must be easily replaceable upon failure or degradation.
The Docker Official Images are a curated set of Docker repositories hosted on Docker Hub. They are designed to: Provide essential base OS repositories (for example, ubuntu, centos) that serve as the starting point for the majority of users.
Container represents an allocated resource in the cluster. The ResourceManager is the sole authority to allocate any Container to applications. The allocated Container is always on a single node and has a unique ContainerId . It has a specific amount of Resource allocated.
It's important to check if the chosen image includes only Hadoop.
(I'm not sure about Cloudera image mentioned above).
Check out the alternatives below:
Sequenceiq:
Image (+1M pulls)
Github repo.
Site
Pull with:docker pull sequenceiq/hadoop-docker
Uhopper:
Image(1M+ pulls)
Bitbucket repo
Site
Pull with:docker pull uhopper/hadoop
Big data europe:
Image (10K+ pulls)
Github repo
Site
Pull with:docker pull bde2020/hadoop-base
Parrot Stream:
Image (1.2K+ pulls)
Github repo
Site
Pull with:docker pull parrotstream/hadoop
Bonus:
Check out this tutorial on how to build Hadoop docker image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With