Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to grab from JSON in selenium python

My page returns JSON http response which contains id: 14

Is there a way in selenium python to grab this? I searched the web and could not find any solutions. Now I am wondering maybe its just not possible? I could grab this id from the db but I am trying to avoid this. Please tell me if there is any ways around. Thank you

like image 799
Nro Avatar asked Oct 30 '14 19:10

Nro


2 Answers

The source of your difficulty is the fact that when a browser is returned raw JSON data, it wraps it in a tiny bit of HTML to make it visible to the user on the screen.

When I visit https://httpbin.org/user-agent in Firefox, for example, the following raw JSON appears in my browser window:

{"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0"
}

But in fact Firefox (and Chrome) has wrapped the JSON in a bit of extra HTML in order to create a document it can actually display. Here is the HTML that Firefox wraps it in, which I can see right in the JavaScript console by evaluating the expression document.documentElement.innerHTML:

<head><link rel="alternate stylesheet" type="text/css"
 href="resource://gre-resources/plaintext.css" title="Wrap Long Lines"></head>
 <body><pre>{"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:42.0)
 Gecko/20100101 Firefox/42.0"
}
</pre></body>

Using BeautifulSoup to parse the HTML, as suggested in another answer, has two serious disadvantages: it introduces a new dependency to your project, and will also be quite slow compared to taking advantage of the fact that the browser will already have parsed the HTML for you and have the resulting DOM ready for your use.

To ask the browser to extract the JSON for you, simply ask it for the text inside of the <body> element, and all of the extra structure that the browser has added will be excluded and the pure JSON be returned:

driver.find_element_by_tag_name('body').text

Or, if you want it parsed into a Python data structure:

import json
json.loads(driver.find_element_by_tag_name('body').text)
like image 178
Brandon Rhodes Avatar answered Sep 28 '22 20:09

Brandon Rhodes


You can use BeautifulSoup to parse the page and extract the json. The code you need should look something like this. You may need to change the soup.find command if the json isn't directly in the body of the response.

from bs4 import BeautifulSoup
import json

soup = BeautifulSoup(driver.page_source)
dict_from_json = json.loads(soup.find("body").text)
like image 25
RobinL Avatar answered Sep 28 '22 19:09

RobinL