Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run WordCountTopology from storm-starter in Intellij

I work with Storm for a while already, but want to get started with development. As suggested, I am using IntelliJ (up to now, I was using Eclipse and did only write topologies against Java API).

I was also looking at https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

This documentation is not complete. I was not able to run anything in Intellij first. I could figure out, that I need to remove the scope of storm-core dependency (in storm-starter pom.xml). (found here: storm-starter with intellij idea,maven project could not find class)

After that I was able to build the project. I can also run ExclamationTopology with no problems within IntelliJ. However, WordCountTopology fails.

First I got the following error:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception: Traceback (most recent call last): File "splitsentence.py", line 16, in import storm ImportError: No module named storm

Update: installing python-storm is not required to make it work

I was able to resolve it via: apt-get install python-storm (from StackOverflow)

However, I don't speak Python and was wondering what the problem is and why I could resolve it like this. Just want to get deeper into it. Maybe someone can explain.

Unfortunately, I am getting a different error now:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception: Traceback (most recent call last): File "splitsentence.py", line 18, in class SplitSentenceBolt(storm.BasicBolt): AttributeError: 'module' object has no attribute 'BasicBolt'

I did not find any solution on the Internet. Asking at [email protected] did not help either. I go the following suggestion:

I think that it was always assumed that topology would always be invoked through storm-command line. Thus working directory would be ${STORM-INSTALLATION}/bin/storm Since storm.py is in the this directory, splitSentence.py would be able to find storm modules. Can you set the working directory to a path, where storm.py is present and then try. If it works, we can add it later to the documentation

However, chancing the working directory did not solve the problem.

And as I am not familiar with Python and as I am new to IntelliJ, I am stuck now. Because ExclamationTopology runs, I guess my basic setup is correct.

What do I do wrong? It is possible at all to run WordcountTopology in LocalCluster in IntelliJ?

like image 896
Matthias J. Sax Avatar asked Aug 13 '15 10:08

Matthias J. Sax


2 Answers

Unfortunately, AFAIK you can't run multilang feature with LocalCluster without having packaged file.

ShellProcess relies on codeDir of TopologyContext, which is used by supervisor. Workers are serialized to stormcode.ser, but multilang files should extracted to outside of serialized file so that python/ruby/node/etc can load it.

Accomplishing this with distribute mode is easy because there's always user submitted jar, and supervisor can know it is what user submitted.

But accomplishing this with local mode is not easy cause supervisor cannot know user submitted jar, and users can run topology to local mode without packaging.

So, Supervisor in local mode finds resource directory ("resources") from each jars (which ends with "jar") in classpath, and copy first occurrence to codeDir.

storm jar places user topology jar to the first of classpath, so it can be run without issue.

So normally, it's natural for ShellProcess to not find "splitsentence.py". Maybe your working directory or PYTHONPATH did the trick.

like image 94
Jungtaek Lim Avatar answered Nov 09 '22 03:11

Jungtaek Lim


I struggled with a similar issue, not with the sample topology, but with my own using a Python bolt.

Also experienced the "AttributeError: 'module' object has no attribute 'BasicBolt'" exception - in local mode and when submitting to the cluster.

There are very few resources on this, I found your question and little else discussing this issue.

In case someone else has the same problem: Make sure you include the correct Maven "multilang-python" dependency in your pom file. This will package the correct run time dependencies into the JAR file needed to run your topology.

like image 22
Will777 Avatar answered Nov 09 '22 03:11

Will777