I have built a python script that uses Selenium to web-scrape. This script needs to run hours at a time. I am only scraping one website in particular and I have so far been able to scrape peacefully by just rotating browser User Agents from a pool of 1,000 agents.
However, I just scaled my script up using multi-threading and suddenly all of my attempts to visit the website when scraping fail due to CAPTCHA.
Apparently, rotating proxies is the answer. How can I rotate proxies with Selenium?
One way to do it is by using http_request_randomizer (explanation in code comments). As you may know free public proxies are highly unreliable, insecure and prone to getting banned. So I wouldn't recommend using this method for a serious project or in production.
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy
from selenium import webdriver
req_proxy = RequestProxy() #you may get different number of proxy when you run this at each time
proxies = req_proxy.get_proxy_list() #this will create proxy list
PROXY = proxies[5].get_address() #select the 6th proxy from the list, of course you can randomly loop through proxies
print(proxies[5].country)
webdriver.DesiredCapabilities.CHROME['proxy'] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY,
"proxyType": "MANUAL",
}
driver = webdriver.Chrome()
driver.get('https://www.expressvpn.com/what-is-my-ip')
The best way to do this, involves a paid proxy service. I'm currently using https://luminati.io/ in a production environment and their service is very reliable, plus it rotates your IP automatically and frequently (almost every request).
See:
Luminati
how to set proxy with authentication in selenium chromedriver python?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With