Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use rspec to test screen scraping?

I'm writing a site that is going to rely a lot on screen scraping. Because I know screen scraping is prone to breaking I'd like to get notified somehow that there is a problem.

The solution that I think will work is to write an rspec test for each site I want to support. The test will open a few remote pages from each site and compare them with the output I expect from my scraper. I'd like to also run the same tests on locally cached copies so I know if my code changes broke the scraper or if the remote site changed. I'd like to somehow run these tests once a day and notify me of any problems.

Eventually I'd like to make this a gem because it's a reoccurring problem for me. I tend to do a lot of scraping and it would be nice to know when things break.

So my problem is I'm relatively new to writing tests for my code and I have no clue what the best way to set this up is.

like image 362
hadees Avatar asked Nov 15 '12 19:11

hadees


1 Answers

Take a look at the VCR gem, which will let you get local copies of various pages you want to test, while having the ability to refresh them every so often, as well as testing against live pages.

like image 126
x1a4 Avatar answered Nov 20 '22 21:11

x1a4