I'm working on a webscraping project, and I am running into problems with cloudflare scrapeshield. Does anyone know how to get around it? I'm using selenium webdriver, which is getting redirected to some lightspeed page by scrapeshield. Built with python on top of firefox. Browsing normally does not cause it to redirect. Is there something that webdriver does differently from a regular browser?
See, what scrapeshield does is checking if you are using a real browser, it's essentially checking your browser for certain bugs in them. Let's say that Chrome can't process an IFrame
if there is a 303 error
in the line at the same time, certain web browser react differently to different tests, so webdriver must not react to these causing the system to say "We got an intruder, change the page!". I might be correct, not 100% sure though...
More Info on source:
I found most of this information on a Defcon talk about web sniffers and preventing them from getting the proper vulnerability information on the server, he made a web browser identifier in PHP too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With