Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error running topology in production cluster with Apache Storm 1.0.0, topology does not start

I have a topology that runs well on a Local cluster. But when I try to run it on a production cluster the following things happens:

  1. The nimbus is up
  2. The storm UI is up
  3. The two workers I use are up
  4. Zookeper is up
  5. I run storm with

    storm jar myjar.jar MyClass

  6. Nimbus submits the topology

  7. The topologies and the workers appears in the storm UI

BUT:

The topology does not start despite the fact that its status is ACTIVE

The log file of the topology does not appear in the workers.

I have the following log in the worker on the supervisor.log:

2016-04-15 13:18:19.831 o.a.s.d.supervisor [WARN] There was a connection problem with nimbus. #error {
 :cause jobs-rec-storm-nimbus
 :via
 [{:type java.lang.RuntimeException
   :message org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: jobs-rec-storm-nimbus
   :at [org.apache.storm.security.auth.TBackoffConnect retryNext TBackoffConnect.java 64]}
  {:type org.apache.storm.thrift.transport.TTransportException
   :message java.net.UnknownHostException: jobs-rec-storm-nimbus
   :at [org.apache.storm.thrift.transport.TSocket open TSocket.java 226]}
  {:type java.net.UnknownHostException
   :message jobs-rec-storm-nimbus
   :at [java.net.AbstractPlainSocketImpl connect AbstractPlainSocketImpl.java 184]}]
 :trace
 [[java.net.AbstractPlainSocketImpl connect AbstractPlainSocketImpl.java 184]
  [java.net.SocksSocketImpl connect SocksSocketImpl.java 392]
  [java.net.Socket connect Socket.java 589]
  [org.apache.storm.thrift.transport.TSocket open TSocket.java 221]
  [org.apache.storm.thrift.transport.TFramedTransport open TFramedTransport.java 81]
  [org.apache.storm.security.auth.SimpleTransportPlugin connect SimpleTransportPlugin.java 103]
  [org.apache.storm.security.auth.TBackoffConnect doConnectWithRetry TBackoffConnect.java 53]
  [org.apache.storm.security.auth.ThriftClient reconnect ThriftClient.java 99]
  [org.apache.storm.security.auth.ThriftClient <init> ThriftClient.java 69]
  [org.apache.storm.utils.NimbusClient <init> NimbusClient.java 106]
  [org.apache.storm.utils.NimbusClient getConfiguredClientAs NimbusClient.java 78]
  [org.apache.storm.utils.NimbusClient getConfiguredClient NimbusClient.java 41]
  [org.apache.storm.blobstore.NimbusBlobStore prepare NimbusBlobStore.java 268]
  [org.apache.storm.utils.Utils getClientBlobStoreForSupervisor Utils.java 462]
  [org.apache.storm.daemon.supervisor$fn__9590 invoke supervisor.clj 942]
  [clojure.lang.MultiFn invoke MultiFn.java 243]
  [org.apache.storm.daemon.supervisor$mk_synchronize_supervisor$this__9351$fn__9369 invoke supervisor.clj 582]
  [org.apache.storm.daemon.supervisor$mk_synchronize_supervisor$this__9351 invoke supervisor.clj 581]
  [org.apache.storm.event$event_manager$fn__8903 invoke event.clj 40]
  [clojure.lang.AFn run AFn.java 22]
  [java.lang.Thread run Thread.java 745]]}
2016-04-15 13:18:19.831 o.a.s.d.supervisor [INFO] Finished downloading code for storm id jobs-KafkaMigration-topology-3-1460740616
2016-04-15 13:18:19.850 o.a.s.d.supervisor [INFO] Missing topology storm code, so can't launch worker with assignment ...(some more numbers)

So I asume that I have a connection problem with nimbus, but the properties file in the worker is:

 storm.zookeeper.servers:
     - "192.168.22.209"
     - "192.168.22.216"
     - "192.168.22.217"

 storm.local.dir: "/app/home/storm"


 storm.zookeeper.root: "/storm-prod"

# 
 nimbus.seeds: ["192.168.120.96"]

And if I make a ping to the nimbus ip from the workers, it returns OK

Where is the error, How can I fix it?

Thanks!

like image 421
George C Avatar asked Apr 15 '16 18:04

George C


2 Answers

Whats appears to happen in this context is that Storm supervisor resolves nimbus from whatever is configured in storm.yaml seeds/host the first time and from then on uses nimbus host name to download the topology artifacts.

If that is correct, DNS is mandatory for a cluster setup. This is far from ideal, specially when using containers in an orchestrated environment like kubernetes.

Current workaround i'm using is adding

storm.local.hostname: "<local.ip.value>" 

to the storm.yaml

Thanks to @bastien who provided the tip on storm user mailing list

like image 148
Alberto Avatar answered Sep 20 '22 14:09

Alberto


I ran into the similar issue. Turns out my firewall rules were blocking the supervisor ports. Make sure the supervisor and nimbus are able to talk to each other.

like image 33
shashank Avatar answered Sep 23 '22 14:09

shashank