This is an issue that is simply driving me nuts. I have a one machine Storm instance running on my Local LAN. I am currently running v0.9.1-incubating
release version (from the Apache Incubator site. The issue is simply that my storm supervisor
process refuses to start after EVERY SINGLE reboot. The hack fix is quite simple, remove the supervisor
and workers
folders from the storm local directory and re run the process; things run hunky dory then on until next reboot.
I'm providing every bit of information I think might be relevant to debug this issue. Please ask for more if needed, but just help me get some resolution.
PS: It doesn't matter if I have topologies running or not.
Supervisor config
[program:zookeeper]
command=/path/to/zookeeper/bin/zkServer.sh "start-foreground"
process_name=zookeeper
directory=/path/to/zookeeper/bin
stdout_logfile=/var/log/zookeeper.log ; stdout log path, NONE$
stderr_logfile=/var/log/err.zookeeper.log ; stderr log path, $
priority=2
user=root
[program:storm-nimbus]
command=/path/to/storm/bin/storm nimbus
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/nimbus.err.log
stdout_logfile=/var/log/storm/nimbus.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=10
[program:storm-ui]
command=/path/to/storm/bin/storm ui
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/ui.err.log
stdout_logfile=/var/log/storm/ui.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=500
[program:storm-supervisor]
command=/path/to/storm/bin/storm supervisor
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/supervisor.err.log
stdout_logfile=/var/log/storm/supervisor.log.log
logfile_maxbytes=20MB
logfile_backups=2
priority=600
[program:storm-logviewer]
command=/path/to/storm/bin/storm logviewer
user=root
autostart=true
autorestart=true
startsecs=10
startretries=2
log_stdout=true
log_stderr=true
stderr_logfile=/var/log/storm/log.err.log
stdout_logfile=/var/log/storm/log.out.log
logfile_maxbytes=20MB
logfile_backups=2
priority=900
Storm config
#Zookeeper
storm.zookeeper.servers:
- "192.168.1.11"
# Nimbus
nimbus.host: "192.168.1.11"
nimbus.childopts: '-Xmx1024m -Djava.net.preferIPv4Stack=true -Dprocess=storm'
# UI
ui.port: 9090
ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true -Dprocess=storm"
# Supervisor
supervisor.childopts: '-Djava.net.preferIPv4Stack=true -Dprocess=storm'
# Worker
worker.childopts: '-Xmx768m -Djava.net.preferIPv4Stack=true -Dprocess=storm'
storm.local.dir: "/path/to/storm"
storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880
storm.messaging.netty.max_retries: 100
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100
Error message
Pastebin for log error message. I'm cross posting the relevant bits here.
java.lang.RuntimeException: java.io.EOFException
at backtype.storm.utils.Utils.deserialize(Utils.java:86) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.utils.LocalState.get(LocalState.java:56) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) ~[na:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:679) [na:1.6.0_27]
Caused by: java.io.EOFException: null
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2322) ~[na:1.6.0_27]
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2791) ~[na:1.6.0_27]
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798) ~[na:1.6.0_27]
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298) ~[na:1.6.0_27]
at backtype.storm.utils.Utils.deserialize(Utils.java:81) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
... 11 common frames omitted
2014-03-11 12:27:25 b.s.util [INFO] Halting process: ("Error when processing an event")
We had that exact same problem (supervisor crashing on start and same log error message) when we had a power outage on 2 of our development servers. I guess just stopping the server without previously stopping the supervisor would have the same effect.
The only working solution we found was to remove the "storm-local/supervisor" folder (I guess something in there got corrupted).
I too faced this similar issue. I used to remove the local folder always and restart the topology.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With