Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a JSON response from a Google Chrome Selenium Webdriver client?

Currently I have Selenium hooked up to python to scrape a webpage. I found out that the page actually pulls data from a JSON API, and I can get a JSON response as long as I'm logged in to the page.

However, my approach of getting that response into python seems a bit junky; I select text enclosed in <pre> tags and use python's json package to parse the data like so:

import json
from selenium import webdriver

url = 'http://jsonplaceholder.typicode.com/posts/1'
driver = webdriver.Chrome()
driver.get(url)
json_text = driver.find_element_by_css_selector('pre').get_attribute('innerText')
json_response = json.loads(json_text)

The only reason I need to select within <pre> tags at all is because when JSON appears in Chrome, it comes formatted like this:

<html>
<head></head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "userId": 1,
  "id": 1,
  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
  "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}</pre>
</body>
</html>

And the only reason I need to do this inside selenium at all is because I need to be logged into the website in order to get a response. Otherwise I get a 401 and no data.

like image 780
Roger Filmyer Avatar asked May 09 '16 17:05

Roger Filmyer


People also ask

How do I get JSON response data?

GET JSON dataawait fetch('/api/names') starts a GET request, and evaluates to the response object when the request is complete. Then, from the server response, you can parse the JSON into a plain JavaScript object using await response. json() (note: response.

What is selenium JSON?

Page Object Model using Page Factory in Selenium WebDriver. An open source Java library which can be used to serialize and deserialize Java objects to (and from) JSON. JSON is Java Script Object Notation, an open standard format that uses human-readable text to transmit data objects consisting of attribute-value pairs.


1 Answers

You can find the pre element and get it's text, then load it via json.loads():

import json 

pre = driver.find_element_by_tag_name("pre").text
data = json.loads(pre)
print(data)

Also, if this does not work as-is, and, as suggested by @Skandix in comments, prepend view-source: to your url.


Also, you may avoid using selenium to get the desired JSON data and transfer the cookies from selenium to requests to keep "staying logged in", see:

  • How do I load session and cookies from Selenium browser to requests library in Python?
like image 134
alecxe Avatar answered Sep 24 '22 07:09

alecxe