Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selenium - python. how to capture network traffic's response [duplicate]

I am using python Django to create a web app. i am using selenium to launch a headless browser(phantomjs) and making some clicks till i reach a particular page. I wish to capture network traffic and get the response of a particular network call. This network call actually holds a html doc as it's response.

Any way to achieve this ?

like image 364
Rich Rajah Avatar asked Oct 03 '18 18:10

Rich Rajah


People also ask

How can I Capture network traffic of a specific page using selenium?

How can I capture network traffic of a specific page using Selenium? We can capture network traffic on a specific page using Selenium webdriver in Python. To achieve this, we take the help of the JavaScript Executor. Selenium can execute JavaScript commands with the help of the execute_script method.

How to capture network traffic in Python?

Python has a package called selenium-wire. You can use that package to capture the network traffics and also validate them. selenium-wire is an extended version of selenium will all the capabilities of selenium along with extra API to capture the network and validate. following is a link of an article https://sensoumya94. medium.

How to capture network traffic using JavaScript in Java?

JavaScript command to be executed is passed as a parameter to this method. To capture the network traffic, we have to pass the command: return window.performance.getEntries () as a parameter to the execute_script method.

How can I use selenium with Python?

You can use a proxy to catch the network traffic. browsermob-proxy works well with selenium in Python. You need to download browsermob executable before. This is the piece of code with Firefox : Browsermob is the right way. I must understand how browsermob works and tor too. For Tor you must enable the HTTPTunnelPort configuration like this.


1 Answers

You can get access to browser or chromedriver logs, they are slightly different when it comes to network responses. The browser log is called performance and the driver log is called driver. They return a json-like object, which you can parse to extract events with Network methods inside them:

{'level': 'INFO',
  'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113832},
 {'level': 'INFO',
  'message': '{"message":{"method":"Page.frameDetached","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113838},
 {'level': 'INFO',
  'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response","frameId":"C2D13BD13CF743B6D0695B35E9CC935C","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"5331BFDC4F466FCED920CFC9F033D2EC","request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"},"initialPriority":"VeryHigh","method":"GET","mixedContentType":"none","referrerPolicy":"no-referrer-when-downgrade","url":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response"},"requestId":"5331BFDC4F466FCED920CFC9F033D2EC","timestamp":104499.729,"type":"Document","wallTime":1538607113.838206}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113839},...}

You need to enable logging in DesiredCapabilities and then parse it using JSON module:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get('https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response')

def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response

browser_log = driver.get_log('performance') 
events = [process_browser_log_entry(entry) for entry in browser_log]
events = [event for event in events if 'Network.response' in event['method']]

I don't know if you can get access to response data itself using this, but you can get a url of the response.

Another option is to use a library like selenium-wire.

UPDATE 2020-10-07 ⬇

As @Roey B and @Inactivist explain in the comments, you can access response body using Network.getResponseBody command:

driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': events[0]["params"]["requestId"]})
like image 117
hellpanderr Avatar answered Oct 06 '22 15:10

hellpanderr