Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bypassing Cloudflare Scrapeshield

I'm working on a webscraping project, and I am running into problems with cloudflare scrapeshield. Does anyone know how to get around it? I'm using selenium webdriver, which is getting redirected to some lightspeed page by scrapeshield. Built with python on top of firefox. Browsing normally does not cause it to redirect. Is there something that webdriver does differently from a regular browser?

like image 302
Namrop Avatar asked Jan 05 '14 08:01

Namrop


1 Answers

See, what scrapeshield does is checking if you are using a real browser, it's essentially checking your browser for certain bugs in them. Let's say that Chrome can't process an IFrame if there is a 303 error in the line at the same time, certain web browser react differently to different tests, so webdriver must not react to these causing the system to say "We got an intruder, change the page!". I might be correct, not 100% sure though...

More Info on source:

I found most of this information on a Defcon talk about web sniffers and preventing them from getting the proper vulnerability information on the server, he made a web browser identifier in PHP too.

like image 112
Cold Diamondz Avatar answered Oct 10 '22 04:10

Cold Diamondz