Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selenium Python - Get Network response body

I use Selenium to react to the reception of data following a GET request from a website. The API called by the website is not public, so if I use the URL of the request to retrieve the data, I get {"message":"Unauthenticated."}.

All I've managed to do so far is to retrieve the header of the response.

I found here that using driver.execute_cdp_cmd('Network.getResponseBody', {...}) might be a solution to my problem.

Here is a sample of my code:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
driver = webdriver.Chrome(
    r"./chromedriver",
    desired_capabilities=capabilities,
)

def processLog(log):
    log = json.loads(log["message"])["message"]
    if ("Network.response" in log["method"] and "params" in log.keys()):
        headers = log["params"]["response"]
        body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
        print(json.dumps(body, indent=4, sort_keys=True))
        return log["params"]
        

logs = driver.get_log('performance')
responses = [processLog(log) for log in logs]

Unfortunately, the driver.execute_cdp_cmd('Network.getResponseBody', {...}) returns:

unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}

Do you know what I am missing?

Do you have any idea on how to retrieve the response body?

Thank you for your help!

like image 683
Seglinglin Avatar asked Dec 17 '20 23:12

Seglinglin


2 Answers

In order to retrieve response body, you have to listen specifically to Network.responseReceived:

def processLog(log):
    log = json.loads(log["message"])["message"]
    if ("Network.responseReceived" in log["method"] and "params" in log.keys()):
        body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})

However, I ended using a different approach relying on requests. I just retrieved the authorization token from the browser console (Network > Headers > Request Headers > Authorization) and used it to get the data I wanted:

import requests

def get_data():
    url = "<your_url>"
    headers = {
        "Authorization": "Bearer <your_access_token>",
        "Content-type": "application/json"
    }
    params = {
        key: value,
        ...
    }

    r = requests.get(url, headers = headers, params = params)

    if r.status_code == 200:
        return r.json()
like image 89
Seglinglin Avatar answered Nov 14 '22 22:11

Seglinglin


Probably some responses don't have a body, thus selenium throws an error that "no resource" for given identifier was found. Error message is a bit ambiguous here.

Try doing like this:

from selenium.common import exceptions

try:
    body = chromedriver.execute_cdp_cmd('Network.getResponseBody', {'requestId': log["params"]["requestId"]})
    log['body'] = body
except exceptions.WebDriverException:
    print('response.body is null')

This way responses without body will not crash your script.

like image 4
Recently_Created_User Avatar answered Nov 14 '22 22:11

Recently_Created_User