Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using proxy with Chromedriver within Google Cloud Engine

I'm trying to use a proxy within Google Cloud Engine with chromedriver.

I've tried many solutions suggested (see below) but everytime the IP was the one on Google server.

Attempt 1:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--ignore-certificate-errors")

myproxy = '207.157.25.44:80'
prox = Proxy()
prox.proxy_type = ProxyType.MANUAL
prox.http_proxy = myproxy
prox.ssl_proxy = myproxy

capabilities = webdriver.DesiredCapabilities.CHROME
prox.add_to_capabilities(capabilities)

driver = webdriver.Chrome(options=chrome_options, 
    executable_path="/user/sebastien/chromedriver", 
    desired_capabilities=capabilities)
driver.get("https://www.whatismyip.com/")
get_location()


Attempt 2:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--ignore-certificate-errors")

myproxy = '207.157.25.44:80'
prefs = {}
prefs["network.proxy.type"] = 1
prefs["network.proxy.http"] = myproxy
prefs["network.proxy.ssl"] = myproxy

chrome_options.add_experimental_option('prefs', prefs)

driver = webdriver.Chrome(options=chrome_options, 
    executable_path="/user/sebastien/chromedriver")
driver.get("https://www.whatismyip.com/")
get_location()

Attempt 3:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--ignore-certificate-errors")

myproxy = '207.157.25.44:80'
chrome_options.add_argument("--proxy-server=http://%s" % myproxy)

driver = webdriver.Chrome(options=chrome_options,
    executable_path="/user/sebastien/chromedriver")
driver.get("https://www.whatismyip.com/")
get_location()

None of them would reach the website with the desired IP.

Again, this issue is happening when running the code on GCP Compute Engine, Canonical, Ubuntu, 16.04 LTS, amd64 xenial.

Below the function to test the IP:

import json
from urllib.request import urlopen

def get_location(ip=False):
    if ip:
        html = urlopen(f"http://ipinfo.io/{str(ip).split(':')[0]}/json")
    else:
        html = urlopen("http://ipinfo.io/json")

    data = json.loads(html.read().decode('utf-8'))
    IP = data['ip']
    org = data['org']
    city = data['city']
    country = data['country']
    region = data['region']

    print('IP detail')
    print('IP : {4} \nRegion : {1} \nCountry : {2} \nCity : {3} \nOrg : {0}'.format(org, region, country, city, IP))

Thanks for reading !

like image 288
Sébastien De Spiegeleer Avatar asked Jul 05 '21 16:07

Sébastien De Spiegeleer


People also ask

What is the role of WebDriver and ChromeDriver?

WebDriver is an interface provided by Selenium WebDriver. As we know that interfaces in Java are the collection of constants and abstract methods(methods without any implementation). The WebDriver interface serves as a contract that each browser specific implementation like ChromeDriver, FireFoxDriver must follow.

Why do we use ChromeDriver?

Why do you need ChromeDriver? The main purpose of the ChromeDriver is to launch Google Chrome. Without that, it is not possible to execute Selenium test scripts in Google Chrome as well as automate any web application. This is the main reason why you need ChromeDriver to run test cases on Google Chrome browser.

Does GCloud use HTTP_proxy and HTTPS_proxy?

Note: The gcloud tool respects the http_proxy, https_proxy, and no_proxy variables set in your proxy configuration. Configure your gcloud proxy settings only if you would like to use a different proxy. If gcloud proxy settings are set, they override existing proxy configuration; this includes ignoring no_proxy.

Why can't Google Chrome connect to a proxy server?

A proxy server is a server that acts as an intermediary between your computer and other servers. Right now, your system is configured to use a proxy, but Google Chrome can't connect to it. If you use a proxy server... Check your proxy settings or contact your network administrator to make sure the proxy server is working.

How do I connect to Google Cloud Platform using a proxy?

When you connect using the proxy, the proxy needs to authenticate with Google Cloud Platform. You can either use your Cloud SDK credentials, or you can provide the proxy with a path to a local key file from a service account you create (recommended for production instances).

How do I create a proxy in Google App Engine launcher?

Open the Google App Engine Launcher from the Start menu on your computer. Open File menu > Create New Application. Enter your Application Identifier as the Application Name for your proxy. Choose a directory to store your app’s local files by clicking on the Browse button. Remember this location.


Video Answer


1 Answers

I don't think the issue that you're having is related to your code implementation. I'm sure that the issue that you're having is related to your usage of a free proxy. These type of proxies are notorious for having connections issues, such as timeouts related to latency. Plus these sites can also be intermittent, which means that they can go down at anytime. And sometimes these sites are being abused, so they can get blocked.

Your proxy is 207.157.25.44:80, which is shown in the image below.

enter image description here

When I tested this code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

proxy_server = '207.157.25.44:80'

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--proxy-server=%s' % proxy_server)

# disable the banner "Chrome is being controlled by automated test software"
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])

driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)

driver.get('https://www.whatismyip.com/')

The Chrome browser opens, but it does not display any content.

enter image description here

If I check the address 207.157.25.44:80 via an online proxy checker service, I get mixed results.

This image below shows that the proxy is not responding to any query types (HTTP, HTTPS, SOCKS4, SOCKS5).

enter image description here

When I do the same check 5 minutes later the proxy is up on HTTP, but has latency issues.

enter image description here

If I selected another proxy from the free proxy website:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

proxy_server = '47.184.133.79:3128'

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--proxy-server=%s' % proxy_server)

# disable the banner "Chrome is being controlled by automated test software"
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])

driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)

driver.get('https://www.whatismyip.com/')

I get a CloudFlare challenge page when connecting to the website whatismyip.

enter image description here

But if I try the same proxy on the website nordvpn.com/what-is-my-ip I get the proxy's IP address.

enter image description here

I would highly recommend testing any free proxy IP address multiple times to see if the address has any types of issues. Additionally, you need to add some error handling in your code to catch issues when a proxy goes offline, because they can drop at anytime.

If you need to use a proxy, I would strongly recommend using a commercial proxy service, because they are more reliable than the free proxy services.

  • oxylabs.io
  • bright data
like image 200
Life is complex Avatar answered Oct 19 '22 19:10

Life is complex