Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unit tests for screen-scraping?

I'm new to unit testing so I'd like to get the opinion of some who are a little more clued-in.

I need to write some screen-scraping code shortly. The target system is a web ui where there'll be copious HTML parsing and similar volatile goodness involved. I'll never be notified of any changes by the target system (e.g. they put a redesign on their site or otherwise change functionality). So I anticipate my code breaking regularly.

So I think my real question is, how much, if any, of my unit testing should worry about or deal with the interface (the website I'm scraping) changing?

I think unit tests or not, I'm going to need to test heavily at runtime since I need to ensure the data I'm consuming is pristine. Even if I ran unit tests prior to every run, the web UI could still change between tests and runtime.

So do I focus on in-code testing and exception handling? Does that mean to draw a line in the sand and exclude this kind of testing from unit tests altogether?

Thanks

like image 259
Chris Avatar asked Dec 08 '09 16:12

Chris


People also ask

How do you detect screen scraping?

Using fingerprinting to detect web scraping The system can use the collected information to identify suspicious clients (potential bots) and recognize web scraping attacks more quickly. On the Main tab, click Security > Application Security > Anomaly Detection > Web Scraping. The Web Scraping screen opens.

What is scraping in testing?

Web scraping is a technique for extracting data from an online source. It provides you with structured data that can be stored in any format.

Can you prevent screen scraping?

Use Captchas if you suspect that your website is being accessed by a scraper. Captchas ("Completely Automated Test to Tell Computers and Humans apart") are very effective against stopping scrapers.


1 Answers

Unit testing should always be designed to have repeatable known results.

Therefore, to unit test a screen-scraper, you should be writing the test against a known set of HTML (you may use a mock object to represent this)

The sort of thing you are talking about doesn't really sound like a scenario for unit-testing to me - if you want to ensure your code runs as robustly as possible, then it is more, as you say, about in-code testing and exception handling.

I would also include some alerting code, so they system made you aware of any occasions when the HTML does not get parsed as expected.

like image 182
DanSingerman Avatar answered Nov 20 '22 08:11

DanSingerman