Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble importing boilerpipe in python

I'm building an application using python which involves getting news articles from RSS feeds. As part of my project, I have decided to use boilerpipe in order to extract just the article content from the html page on which the article appears.

Although boilerpipe was originally written for java, it has been ported to python too. You can see its page on github here: https://github.com/misja/python-boilerpipe

The problem is that I get an exception when trying to import it using:

from boilerpipe.extract import Extractor

The error I get is:

Traceback (most recent call last):
File "", line 1, in
File "build\bdist.win32\egg\boilerpipe\extract__init__.py", line 12, in
File "C:\Python26\lib\site-packages\jpype_jclass.py", line 54, in JClass
raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
jpype._jexception.ExceptionPyRaisable: java.lang.Exception: Class 
de.l3s.boilerpipe.sax.HTMLHighlighter not found

What might be causing this problem and how can I fix it?

like image 710
user1106610 Avatar asked Mar 28 '26 21:03

user1106610


2 Answers

This worked for me on Mac OS X 10.8.5 with Python 2.7.9.:

pip install JPype1    # to install https://pypi.python.org/pypi/JPype1
pip install charade
git clone https://github.com/misja/python-boilerpipe.git
cd python-boilerpipe
sudo python setup.py install

Then you should be able to do in the python console

>>> from boilerpipe.extract import Extractor
>>> extractor = Extractor(extractor='ArticleExtractor', url="http://en.wikipedia.org/wiki/Main_Page")
>>> print extractor.getText()
like image 176
asmaier Avatar answered Mar 31 '26 04:03

asmaier


You are missing boiler pipe java packages install, you can find it here - http://code.google.com/p/boilerpipe/downloads/list

you have only install python boilerpipe wrapper.

like image 34
Mutant Avatar answered Mar 31 '26 06:03

Mutant