Currently I have Selenium hooked up to python to scrape a webpage. I found out that the page actually pulls data from a JSON API, and I can get a JSON response as long as I'm logged in to the page. However, my approach of getting that response into python seems a bit junky; I select text enclosed in <code><pre></code> tags and use python's <code>json</code> package to parse the data like so: <pre class="prettyprint"><code>import json from selenium import webdriver url = 'http://jsonplaceholder.typicode.com/posts/1' driver = webdriver.Chrome() driver.get(url) json_text = driver.find_element_by_css_selector('pre').get_attribute('innerText') json_response = json.loads(json_text) </code></pre> The only reason I need to select within <code><pre></code> tags at all is because when JSON appears in Chrome, it comes formatted like this: <pre class="prettyprint"><code><html> <head></head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;">{ "userId": 1, "id": 1, "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit", "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto" }</pre> </body> </html> </code></pre> And the only reason I need to do this inside selenium at all is because I need to be logged into the website in order to get a response. Otherwise I get a 401 and no data.

You can find the <code>pre</code> element and get it's text, then load it via <code>json.loads()</code>: <pre class="prettyprint"><code>import json pre = driver.find_element_by_tag_name("pre").text data = json.loads(pre) print(data) </code></pre> Also, if this does not work as-is, and, as suggested by @Skandix in comments, prepend <code>view-source:</code> to your url. <hr> Also, you may avoid using <code>selenium</code> to get the desired JSON data and transfer the cookies from <code>selenium</code> to <code>requests</code> to keep "staying logged in", see: <ul> <li>How do I load session and cookies from Selenium browser to requests library in Python?</li> </ul>

How to get a JSON response from a Google Chrome Selenium Webdriver client?

Tags:

python

selenium-webdriver

Currently I have Selenium hooked up to python to scrape a webpage. I found out that the page actually pulls data from a JSON API, and I can get a JSON response as long as I'm logged in to the page.

However, my approach of getting that response into python seems a bit junky; I select text enclosed in <pre> tags and use python's json package to parse the data like so:

import json
from selenium import webdriver

url = 'http://jsonplaceholder.typicode.com/posts/1'
driver = webdriver.Chrome()
driver.get(url)
json_text = driver.find_element_by_css_selector('pre').get_attribute('innerText')
json_response = json.loads(json_text)

The only reason I need to select within <pre> tags at all is because when JSON appears in Chrome, it comes formatted like this:

<html>
<head></head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "userId": 1,
  "id": 1,
  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
  "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}</pre>
</body>
</html>

And the only reason I need to do this inside selenium at all is because I need to be logged into the website in order to get a response. Otherwise I get a 401 and no data.

780

asked May 09 '16 17:05

Roger Filmyer

1 Answers

You can find the pre element and get it's text, then load it via json.loads():

import json 

pre = driver.find_element_by_tag_name("pre").text
data = json.loads(pre)
print(data)

Also, if this does not work as-is, and, as suggested by @Skandix in comments, prepend view-source: to your url.

Also, you may avoid using selenium to get the desired JSON data and transfer the cookies from selenium to requests to keep "staying logged in", see:

How do I load session and cookies from Selenium browser to requests library in Python?

134

answered Sep 24 '22 07:09

alecxe

Related questions
                            
                                TypeError: int() argument must be a string or a number, not 'datetime.datetime'
                            
                                Update message in Kafka topic
                            
                                json query that returns parent element and child data?
                            
                                Streaming a generated CSV with Flask
                            
                                Python 2.7 exception handling syntax
                            
                                concatenate numpy string array along an axis?
                            
                                Why does Django do cascading deletes on foreign keys?
                            
                                how does theano.scan's updates work?
                            
                                PyMySQL and OrderedDict
                            
                                What's the difference between [] and [[]] in pandas?
                            
                                Plotting a heat map from three lists: X, Y, Intensity
                            
                                How to get travis to fail if tests do not have enough coverage for python
                            
                                Pandas sum over duplicated indices with sum
                            
                                Does alembic care what its migration files are called?
                            
                                How to merge two pandas dataframe in parallel (multithreading or multiprocessing)
                            
                                Error installing Numba on OS X
                            
                                python import module from a package
                            
                                How to perfectly convert one-element list to tuple in Python?
                            
                                what is difference between [None] and [] in python? [duplicate]
                            
                                Return tuple with smallest y value from list of tuples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With