Suppose I downloaded the HTML code, and I can parse it. How do I get the "best" description of that website, if that website does not have meta-description tag?
You could get the first few sentence returned from something like Readability.
Safari 5 uses it, so it must be alright :)
To follow up on the "Readability" suggestion above (which itself is inspired by the website InstaPaper), they have release the JavaScript: http://code.google.com/p/arc90labs-readability/. What's more, some guy took that and ported it to python: http://github.com/gfxmonk/python-readability. Rejoice!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With