Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure multi-node Apache Storm cluster

I'm following http://jayatiatblogs.blogspot.com/2011/11/storm-installation.html & http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup to set up Apache Storm cluster in Ubuntu 14.04 LTS at AWS EC2.

My master node is 10.0.0.185. My slave nodes are 10.0.0.79, 10.0.0.124 & 10.0.0.84 with myid of 1, 2 and 3 in their zookeeper-data respectively. I set up an ensemble of Apache Zookeeper consists of all the 3 slave nodes.

Below are my zoo.cfg for my slave nodes:

tickTime=2000
initLimit=10
syncLimit=5

dataDir=/home/ubuntu/zookeeper-data
clientPort=2181

server.1=10.0.0.79:2888:3888
server.2=10.0.0.124:2888:3888
server.3=10.0.0.84:2888:3888

autopurge.snapRetainCount=3
autopurge.purgeInterval=1

Below are my storm.yaml for my slave nodes:

########### These MUST be filled in for a storm configuration
 storm.zookeeper.server:
     - "10.0.0.79"
     - "10.0.0.124"
     - "10.0.0.84"
#     - "localhost"
 storm.zookeeper.port: 2181

# nimbus.host: "localhost"
 nimbus.host: "10.0.0.185"

 storm.local.dir: "/home/ubuntu/storm/data"
 java.library.path: "/usr/lib/jvm/java-7-oracle"

 supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
     - 6704
#
# worker.childopts: "-Xmx768m"
# nimbus.childopts: "-Xmx512m"
# supervisor.childopts: "-Xmx256m"
#
# ##### These may optionally be filled in:
#
## List of custom serializations
# topology.kryo.register:
#     - org.mycompany.MyType
#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
#     - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
#     - "server1"
#     - "server2"

## Metrics Consumers
# topology.metrics.consumer.register:
#   - class: "backtype.storm.metric.LoggingMetricsConsumer"
#     parallelism.hint: 1
#   - class: "org.mycompany.MyMetricsConsumer"
#     parallelism.hint: 1
#     argument:
#       - endpoint: "metrics-collector.mycompany.org"

Below are the storm.yaml for my master node:

########### These MUST be filled in for a storm configuration
 storm.zookeeper.servers:
     - "10.0.0.79"
     - "10.0.0.124"
     - "10.0.0.84"
#     - "localhost"
#
 storm.zookeeper.port: 2181

 nimbus.host: "10.0.0.185"
# nimbus.thrift.port: 6627
# nimbus.task.launch.secs: 240

# supervisor.worker.start.timeout.secs: 240
# supervisor.worker.timeout.secs: 240

 ui.port: 8772

#  nimbus.childopts: "‐Xmx1024m ‐Djava.net.preferIPv4Stack=true"

#  ui.childopts: "‐Xmx768m ‐Djava.net.preferIPv4Stack=true"
#  supervisor.childopts: "‐Djava.net.preferIPv4Stack=true"
#  worker.childopts: "‐Xmx768m ‐Djava.net.preferIPv4Stack=true"

 storm.local.dir: "/home/ubuntu/storm/data"

 java.library.path: "/usr/lib/jvm/java-7-oracle"

# supervisor.slots.ports:
#     - 6700
#     - 6701
#     - 6702
#     - 6703
#     - 6704

# worker.childopts: "-Xmx768m"
# nimbus.childopts: "-Xmx512m"
# supervisor.childopts: "-Xmx256m"

# ##### These may optionally be filled in:
#
## List of custom serializations
# topology.kryo.register:
#     - org.mycompany.MyType
#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
#     - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
#     - "server1"
#     - "server2"

## Metrics Consumers
# topology.metrics.consumer.register:
#   - class: "backtype.storm.metric.LoggingMetricsConsumer"
#     parallelism.hint: 1
#   - class: "org.mycompany.MyMetricsConsumer"
#     parallelism.hint: 1
#     argument:
#       - endpoint: "metrics-collector.mycompany.org"

I start my zookeeper in all my slave nodes, then start my storm nimbus in my master node, then start storm supervisor in all my slave nodes. However, when I view in my Storm UI, there is only 1 supervisor with total 5 slots in the cluster summary & only 1 supervisor information in the supervisor summary, why so?

How many slave nodes is actually working if I submit a topology in this case?

Why it is not 3 supervisors with total 15 slots?

What should I do in order to have 3 supervisors?

When I check in the supervisor.log in the slave nodes, the causes is as below:

2015-05-29T09:21:24.185+0000 b.s.d.supervisor [INFO] 5019754f-cae1-4000-beb4-fa0
16bd1a43d still hasn't started
like image 249
Toshihiko Avatar asked May 29 '15 09:05

Toshihiko


People also ask

How many types of nodes are present in Storm cluster?

There are two kinds of nodes on a Storm cluster: the master node and the worker nodes. The master node runs a daemon called "Nimbus" that is similar to Hadoop's "JobTracker". Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.

How does Storm use zookeeper?

Storm uses Zookeeper for coordinating the cluster. Zookeeper is not used for message passing, so the load Storm places on Zookeeper is quite low. Single node Zookeeper clusters should be sufficient for most cases, but if you want failover or are deploying large Storm clusters you may want larger Zookeeper clusters.

How do you run a Storm topology locally?

To install Storm locally, download a release from here and unzip it somewhere on your computer. Then add the unpacked bin/ directory onto your PATH and make sure the bin/storm script is executable. Installing a Storm release locally is only for interacting with remote clusters.


Video Answer


1 Answers

What you are doing perfect and its works too.

The only thing you should change is your storm.dir. It is same in the slave and the master nodes just change the path in the storm.dir path in nimbus & supervisor nodes (don't use same local path). When you use same local path the nimbus and supervisor share same id. They come into play but you don’t see 8 slots they just show you 4 slots as workers.

Change the (storm.local.dir:/home/ubuntu/storm/data) and don`t use same path in supervisor and nimbus.

like image 94
Dilip Bobby Avatar answered Sep 28 '22 01:09

Dilip Bobby