Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rotate Selenium webrowser IP address

I have a Python script that visits a website every 30 sec, and I would need to have a different IP address each time.

What would be the best/most time effective solution?

  • scraping free proxies online? Do you know a python script that gather proxies from many sources?

  • use Tor browser to have a different IP each time (I'm using selenium on an aws ec2 instance, you guys know a tutorial on how to use Tor browser on Ubuntu server?)

  • other methods?

like image 915
Timothée Avatar asked Dec 19 '19 12:12

Timothée


People also ask

What is a rotating proxy?

A rotating proxy is a proxy server that assigns a new IP address from the proxy pool for every connection. That means you can launch a script to send 10,000 requests to any number of sites and get 10,000 different IP addresses.


2 Answers

To gather and use different proxies a robust solution would be to make proxied requests to the website using the newly active proxies which gets listed within the Free Proxy List using the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://sslproxies.org/")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='table table-striped table-bordered dataTable']//th[contains(., 'IP Address')]"))))
    ips = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='table table-striped table-bordered dataTable']//tbody//tr[@role='row']/td[position() = 1]")))]
    ports = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='table table-striped table-bordered dataTable']//tbody//tr[@role='row']/td[position() = 2]")))]
    driver.quit()
    proxies = []
    for i in range(0, len(ips)):
        proxies.append(ips[i]+':'+ports[i])
    print(proxies)
    for i in range(0, len(proxies)):
        try:
            print("Proxy selected: {}".format(proxies[i]))
            options = webdriver.ChromeOptions()
            options.add_argument('--proxy-server={}'.format(proxies[i]))
            driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
            driver.get("https://www.whatismyip.com/proxy-check/?iref=home")
            if "Proxy Type" in WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.card-text"))):
                break
        except Exception:
            driver.quit()
    print("Proxy Invoked")
    
  • Console Output:

    ['190.7.158.58:39871', '175.139.179.65:54980', '186.225.45.146:45672', '185.41.99.100:41258', '43.230.157.153:52986', '182.23.32.66:30898', '36.37.160.253:31450', '93.170.15.214:56305', '36.67.223.67:43628', '78.26.172.44:52490', '36.83.135.183:3128', '34.74.180.144:3128', '206.189.122.177:3128', '103.194.192.42:55546', '70.102.86.204:8080', '117.254.216.97:23500', '171.100.221.137:8080', '125.166.176.153:8080', '185.146.112.24:8080', '35.237.104.97:3128']
    
    Proxy selected: 190.7.158.58:39871
    
    Proxy selected: 175.139.179.65:54980
    
    Proxy selected: 186.225.45.146:45672
    
    Proxy selected: 185.41.99.100:41258
    
like image 156
undetected Selenium Avatar answered Sep 20 '22 11:09

undetected Selenium


The site 'https://sslproxies.org/' seems got updated. Here is an updated code -

from selenium import webdriver
from selenium.webdriver.common.by import By
import chromedriver_autoinstaller # pip install chromedriver-autoinstaller

chromedriver_autoinstaller.install() # To update your chromedriver automatically
driver = webdriver.Chrome()

# Get free proxies for rotating
def get_free_proxies(driver):
    driver.get('https://sslproxies.org')

    table = driver.find_element(By.TAG_NAME, 'table')
    thead = table.find_element(By.TAG_NAME, 'thead').find_elements(By.TAG_NAME, 'th')
    tbody = table.find_element(By.TAG_NAME, 'tbody').find_elements(By.TAG_NAME, 'tr')

    headers = []
    for th in thead:
        headers.append(th.text.strip())

    proxies = []
    for tr in tbody:
        proxy_data = {}
        tds = tr.find_elements(By.TAG_NAME, 'td')
        for i in range(len(headers)):
            proxy_data[headers[i]] = tds[i].text.strip()
        proxies.append(proxy_data)
    
    return proxies


free_proxies = get_free_proxies(driver)

print(free_proxies)

You'll get an output in python dictionary like this -

[{'IP Address': '200.85.169.18',
  'Port': '47548',
  'Code': 'NI',
  'Country': 'Nicaragua',
  'Anonymity': 'elite proxy',
  'Google': 'no',
  'Https': 'yes',
  'Last Checked': '8 secs ago'},
 {'IP Address': '191.241.226.230',
  'Port': '53281',
  'Code': 'BR',
  'Country': 'Brazil',
  'Anonymity': 'elite proxy',
  'Google': 'no',
  'Https': 'yes',
  'Last Checked': '8 secs ago'},
.
.
.
}]
like image 34
Mahmudur Rahman Shovon Avatar answered Sep 20 '22 11:09

Mahmudur Rahman Shovon