If I browse to https://httpbin.org/headers
I expect to get the following JSON response:
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "close",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
}
}
However, if I use Selenium
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
url = 'https://httpbin.org/headers'
driver.get(url)
print(driver.page_source)
driver.close()
I get
<html platform="linux" class="theme-light" dir="ltr"><head><meta http-equiv="Content-Security-Policy" content="default-src 'none' ; script-src resource:; "><link rel="stylesheet" type="text/css" href="resource://devtools-client-jsonview/css/main.css"><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="viewer-config" src="resource://devtools-client-jsonview/viewer-config.js"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="json-viewer" src="resource://devtools-client-jsonview/json-viewer.js"></script></head><body><div id="content"><div id="json">{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "close",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
}
}
</div></div><script src="resource://devtools-client-jsonview/lib/require.js" data-main="resource://devtools-client-jsonview/viewer-config.js"></script></body></html>
Where do the HTML tags come from? How do I get the raw JSON response of a HTTP request from driver.page_source
?
To request JSON from a URL, you need to send an HTTP GET request to the server and provide the Accept: application/json request header with your request. The Accept header tells the server that our client is expecting JSON.
json() – Python requests. response. json() returns a JSON object of the result (if the result was written in JSON format, if not it raises an error).
The URL you want to get is opened, this just opens the link in the browser. Then you can use the attribute . page_source to get the HTML code. You can then optionally output the HTML source (or do something else with it).
We can get HTML with JavaScript rendered source code by using Selenium webdriver. Selenium can execute JavaScript commands with the help of the executeScript method. JavaScript command to be executed is passed as a parameter to the method. To obtain the HTML, with JavaScript, we shall pass return document.
use the "view-source:" parameter in your url
Simple Mode:
example:
url = 'view-source:https://httpbin.org/headers'
driver.get(url)
content = driver.page_source
print(content)
output:
'<html><head><meta name="viewport" content="width=device-width"><title>https://httpbin.org/headers</title><link rel="stylesheet" type="text/css" href="resource://content-accessible/viewsource.css"></head><body id="viewsource" class="highlight" style="-moz-tab-size: 4"><pre>{\n "headers": {\n "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \n "Accept-Encoding": "gzip, deflate, br", \n "Accept-Language": "en-US,en;q=0.5", \n "Host": "httpbin.org", \n "Upgrade-Insecure-Requests": "1", \n "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0"\n }\n}\n</pre></body></html>'
Best Mode: (for JSON)
example:
url = 'view-source:https://httpbin.org/headers'
driver.get(url)
content = driver.page_source
content = driver.find_element_by_tag_name('pre').text
parsed_json = json.loads(content)
print(parsed_json)
output:
{'headers': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.5',
'Host': 'httpbin.org',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0'}}
In addition to the raw JSON response, driver.page_source
also contains the HTML to "pretty print" the response in the browser. You'll get the same result, if you use the Firefox DOM and Style Inspector to view the source of the JSON response in the browser.
To get the raw JSON response you can navigate the HTML elements as usual:
print(driver.find_element_by_xpath("//div[@id='json']").text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With