I am developing an application in rails which requires to check whether the entered website has search engine friendly URLs generated or not.A solution i have in mind is using nokogiri to parse the HTML of the site and look in the link tag for finding URLs and see if they are search engine friendly.Is there any other way this can be done?Any help would be really great.
You have two problems here:
How do you formally (programmatically) define what a "search engine frienldy URL is". I'm assuming you have some way of doing this already. So that leaves...
How to check all the links on a website.
So for (2) I would look at something like Anemone which will make it easy for you to crawl complete websites:
Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.
The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful.
For simple crawling Anemone will even give you an array of all links on a page, so you won't necessarily even need Nokogiri. For more complex stuff maybe you want to combine Anemone with something like Mechanize and Nokogiri. That depends on your requirements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With