How can I prevent my asp.net 3.5 website from being screen scraped by my competitor? Ideally, I want to ensure that no webbots or screenscrapers can extract data from my website.
Is there a way to detect that there is a webbot or screen scraper running ?
Use Captchas if you suspect that your website is being accessed by a scraper. Captchas ("Completely Automated Test to Tell Computers and Humans apart") are very effective against stopping scrapers.
Many websites on the web do not have any anti-scraping mechanism but some of the websites do block scrapers because they do not believe in open data access.
It is possible to try to detect screen scrapers:
Use cookies and timing, this will make it harder for those out of the box screen scrapers. Also check for javascript support, most scrapers do not have it. Check Meta browser data to verify it is really a web browser.
You can also check for requests in a minute, a user driving a browser can only make a small number of requests per minute, so logic on the server that detects too many requests per minute could presume that screen scraping is taking place and prevent access from the offending IP address for some period of time. If this starts to affect crawlers, log the users ip that is blocked, and start allowing their IPs as needed.
You can use http://www.copyscape.com/ to proect your content also, this will at least tell you who is reusing your data.
See this question also:
Protection from screen scraping
Also take a look at
http://blockscraping.com/
Nice doc about screen scraping:
http://www.realtor.org/wps/wcm/connect/5f81390048be35a9b1bbff0c8bc1f2ed/scraping_sum_jun_04.pdf?MOD=AJPERES&CACHEID=5f81390048be35a9b1bbff0c8bc1f2ed
How to prevent screen scraping:
http://mvark.blogspot.com/2007/02/how-to-prevent-screen-scraping.html
Unplug the network cable to the server.
paraphrase: if public can see it, it can be scraped.
update: upon second look it appears that I am not answering the question. Sorry. Vecdid has offered a good answer.
But any half decent coded could defeat the measures listed. In that context, my answer could be considered valid.
I don't think it is possible without authenticating users to your site.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With