There is a great bookmarklet script that takes a HTML document and, using javascript, strips out the main article content (like Instapaper, but better).
I want to know the most efficient way to use this same javascript script on the server side with Rails 3.
Is it even possible? The ideal would be to be able to request a URL from the server (in Rails) and then parse the response using the javascript, and return the processed text (and then persist it to a db).
I was thinking of just adapting the script in Ruby, but this seems silly, especially since jQuery and javascript itself have a bunch of built-in functions for parsing a DOM. On the other hand, the script uses DOM constructions from the browser, so it might require a server-side browser?
Any suggestions?
We actually do this very thing in one of our webapps. If you want to implement this functionality server-side in your Ruby on Rails application, your best bet is to use a Ruby HTML/XML parsing library, such as Nokogiri.
I wrote an article specifically explaining how to strip out the important information from a linked webpage, like Instapaper does, using Ruby + Nokogiri.
Create a Printable Format for Any Webpage with Ruby and Nokogiri
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With