I am using python Django to create a web app. i am using selenium to launch a headless browser(phantomjs) and making some clicks till i reach a particular page. I wish to capture network traffic and get the response of a particular network call. This network call actually holds a html doc as it's response.
Any way to achieve this ?
How can I capture network traffic of a specific page using Selenium? We can capture network traffic on a specific page using Selenium webdriver in Python. To achieve this, we take the help of the JavaScript Executor. Selenium can execute JavaScript commands with the help of the execute_script method.
Python has a package called selenium-wire. You can use that package to capture the network traffics and also validate them. selenium-wire is an extended version of selenium will all the capabilities of selenium along with extra API to capture the network and validate. following is a link of an article https://sensoumya94. medium.
JavaScript command to be executed is passed as a parameter to this method. To capture the network traffic, we have to pass the command: return window.performance.getEntries () as a parameter to the execute_script method.
You can use a proxy to catch the network traffic. browsermob-proxy works well with selenium in Python. You need to download browsermob executable before. This is the piece of code with Firefox : Browsermob is the right way. I must understand how browsermob works and tor too. For Tor you must enable the HTTPTunnelPort configuration like this.
You can get access to browser or chromedriver logs, they are slightly different when it comes to network responses. The browser log is called performance
and the driver log is called driver
. They return a json-like object, which you can parse to extract events with Network methods inside them:
{'level': 'INFO',
'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
'timestamp': 1538607113832},
{'level': 'INFO',
'message': '{"message":{"method":"Page.frameDetached","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
'timestamp': 1538607113838},
{'level': 'INFO',
'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response","frameId":"C2D13BD13CF743B6D0695B35E9CC935C","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"5331BFDC4F466FCED920CFC9F033D2EC","request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"},"initialPriority":"VeryHigh","method":"GET","mixedContentType":"none","referrerPolicy":"no-referrer-when-downgrade","url":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response"},"requestId":"5331BFDC4F466FCED920CFC9F033D2EC","timestamp":104499.729,"type":"Document","wallTime":1538607113.838206}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
'timestamp': 1538607113839},...}
You need to enable logging in DesiredCapabilities
and then parse it using JSON
module:
import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get('https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response')
def process_browser_log_entry(entry):
response = json.loads(entry['message'])['message']
return response
browser_log = driver.get_log('performance')
events = [process_browser_log_entry(entry) for entry in browser_log]
events = [event for event in events if 'Network.response' in event['method']]
I don't know if you can get access to response data itself using this, but you can get a url of the response.
Another option is to use a library like selenium-wire.
UPDATE 2020-10-07 ⬇
As @Roey B and @Inactivist explain in the comments, you can access response body using Network.getResponseBody command:
driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': events[0]["params"]["requestId"]})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With