How to return selenium browser (or how to import a def that return selenium browser)

Question

I would like to start a selenium browser with a particular setup (privoxy, Tor, randon user agent...) in a function and then call this function in my code. I have created a python script mybrowser.py with this inside:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from fake_useragent import UserAgent
from stem import Signal
from stem.control import Controller

class MyBrowserClass:
    def start_browser():
        service_args = [
            '--proxy=127.0.0.1:8118',
            '--proxy-type= http',
            ]
        dcap = dict(DesiredCapabilities.PHANTOMJS)
        dcap["phantomjs.page.settings.userAgent"] = (UserAgent().random)

        browser = webdriver.PhantomJS(service_args = service_args,         desired_capabilities=dcap)
        return browser

    def set_new_ip():
        with Controller.from_port(port=9051) as controller:
            controller.authenticate(password=password) 
            controller.signal(Signal.NEWNYM)

Then I import it into another script myscraping.py with this inside:

import mybrowser
import time

browser= mybrowser.MyBrowserClass.start_browser()
browser.get("https://canihazip.com/s")
print(browser.page_source)
mybrowser.MyBrowserClass.set_new_ip()
time.sleep(12) 
browser.get("https://canihazip.com/s")
print(browser.page_source)

The browser is working - I can access the page and retrieve it with .page_source.

But the IP doesn't change between the first and the second print. If I move the content of the function inside myscraping.py (and remove the import + function call) then the IP change.

Why? Is it a problem with returning the browser? How can I fix this?

Actually, the situation is a bit more complex. When I connect to https://check.torproject.org before and after the call to mybrowser.set_new_ip() and the wait of 12 sec (cf the lines below), the IP given by the webpage changes between the first and the second call. So my Ip is changed (according to Tor) but neither https://httpbin.org/ip nor icanhazip.com detects the change in the IP.

...
browser.get("https://canihazip.com/s")
print(browser.page_source)
browser.get("https://check.torproject.org/")
print(browser.find_element_by_xpath('//div[@class="content"]').text )
mybrowser.set_new_ip()
time.sleep(12) 
browser.get("https://check.torproject.org/")
print(browser.find_element_by_xpath('//div[@class="content"]').text )
browser.get("https://canihazip.com/s")
print(browser.page_source)

So the IP that are printed are like that:

42.38.215.198 (canihazip before mybrowser.set_new_ip() )
42.38.215.198  (check.torproject before mybrowser.set_new_ip() )
106.184.130.30  (check.torproject after mybrowser.set_new_ip() )
42.38.215.198 (canihazip after  mybrowser.set_new_ip())

Privoxy configuration: in C:\Program Files (x86)\Privoxy\config.txt, I have uncommented this line (9050 is the port Tor uses):

forward-socks5t   /               127.0.0.1:9050

Tor configuration: in torcc, I have this:

ControlPort 9051
HashedControlPassword : xxxx

Cole · Accepted Answer

This is probably because of PhantomJS keeping a memory cache of requested content. So your first visit using a PhantomJS browser can have a dynamic result but that result is then cached and each consecutive visit uses that cached page.

This memory cache has caused issues like CSRF-Token's not changing on refresh and now I believe it is the root cause of your problem. The issue was presented and resolved in 2013 but the solution is a method, clearMemoryCache, found in PhantomJS's WebPage class. Sadly, we are dealing with a Selenium webdriver.PhantomJS instance.

So, unless I am overseeing something, it'd be tough to access this method through Selenium's abstraction.

The only solution I see fit is to use another webdriver that doesn't have a memory cache like PhantomJS's. I have tested it using Chrome and it works perfectly:

103.***.**.***
72.***.***.***

Also, as a side note, Selenium is phasing out PhantomJS:

UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead

How to return selenium browser (or how to import a def that return selenium browser)

Tags:

python

definition

return

selenium

phantomjs

MagTun

1 Answers

Cole

Recent Activity

Donate For Us

How to return selenium browser (or how to import a def that return selenium browser)

Tags:

python

definition

return

selenium

phantomjs

MagTun

1 Answers

Cole

Related questions

Recent Activity

Donate For Us