I'm looking for a package / module / function etc. that is approximately the Python equivalent of Arc90's readability.js
http://lab.arc90.com/experiments/readability
http://lab.arc90.com/experiments/readability/js/readability.js
so that I can give it some input.html and the result is cleaned up version of that html page's "main text". I want this so that I can use it on the server-side (unlike the JS version that runs only on browser side).
Any ideas?
PS: I have tried Rhino + env.js and that combination works but the performance is unacceptable it takes minutes to clean up most of the html content :( (still couldn't find why there is such a big performance difference).
Please try my fork https://github.com/buriy/python-readability which is fast and has all features of latest javascript version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With