Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the raw JSON response of a HTTP request from `driver.page_source` in Selenium webdriver Firefox

If I browse to https://httpbin.org/headers I expect to get the following JSON response:

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.5", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}

However, if I use Selenium

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)

url = 'https://httpbin.org/headers'
driver.get(url)
print(driver.page_source)
driver.close()

I get

<html platform="linux" class="theme-light" dir="ltr"><head><meta http-equiv="Content-Security-Policy" content="default-src 'none' ; script-src resource:; "><link rel="stylesheet" type="text/css" href="resource://devtools-client-jsonview/css/main.css"><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="viewer-config" src="resource://devtools-client-jsonview/viewer-config.js"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="json-viewer" src="resource://devtools-client-jsonview/json-viewer.js"></script></head><body><div id="content"><div id="json">{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.5", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}
</div></div><script src="resource://devtools-client-jsonview/lib/require.js" data-main="resource://devtools-client-jsonview/viewer-config.js"></script></body></html>

Where do the HTML tags come from? How do I get the raw JSON response of a HTTP request from driver.page_source?

like image 825
finefoot Avatar asked Jan 03 '19 14:01

finefoot


People also ask

How do I access request JSON?

To request JSON from a URL, you need to send an HTTP GET request to the server and provide the Accept: application/json request header with your request. The Accept header tells the server that our client is expecting JSON.

How do I request a response in JSON?

json() – Python requests. response. json() returns a JSON object of the result (if the result was written in JSON format, if not it raises an error).

How do I get HTML response in Selenium?

The URL you want to get is opened, this just opens the link in the browser. Then you can use the attribute . page_source to get the HTML code. You can then optionally output the HTML source (or do something else with it).

How can I get HTML with JavaScript rendered source code by using Selenium?

We can get HTML with JavaScript rendered source code by using Selenium webdriver. Selenium can execute JavaScript commands with the help of the executeScript method. JavaScript command to be executed is passed as a parameter to the method. To obtain the HTML, with JavaScript, we shall pass return document.


2 Answers

use the "view-source:" parameter in your url

Simple Mode:

example:

url = 'view-source:https://httpbin.org/headers'
driver.get(url)
content = driver.page_source
print(content)

output:

'<html><head><meta name="viewport" content="width=device-width"><title>https://httpbin.org/headers</title><link rel="stylesheet" type="text/css" href="resource://content-accessible/viewsource.css"></head><body id="viewsource" class="highlight" style="-moz-tab-size: 4"><pre>{\n  "headers": {\n    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Accept-Language": "en-US,en;q=0.5", \n    "Host": "httpbin.org", \n    "Upgrade-Insecure-Requests": "1", \n    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0"\n  }\n}\n</pre></body></html>'

Best Mode: (for JSON)

example:

url = 'view-source:https://httpbin.org/headers'
driver.get(url)
content = driver.page_source
content = driver.find_element_by_tag_name('pre').text
parsed_json = json.loads(content)
print(parsed_json)

output:

{'headers': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Encoding': 'gzip, deflate, br',
  'Accept-Language': 'en-US,en;q=0.5',
  'Host': 'httpbin.org',
  'Upgrade-Insecure-Requests': '1',
  'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0'}}
like image 92
Jonathan Avatar answered Nov 15 '22 04:11

Jonathan


In addition to the raw JSON response, driver.page_source also contains the HTML to "pretty print" the response in the browser. You'll get the same result, if you use the Firefox DOM and Style Inspector to view the source of the JSON response in the browser.

To get the raw JSON response you can navigate the HTML elements as usual:

print(driver.find_element_by_xpath("//div[@id='json']").text)
like image 45
finefoot Avatar answered Nov 15 '22 03:11

finefoot