Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get html with javascript rendered sourcecode by using selenium

I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:

from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:\python27\scripts\phantomjs.exe')
driver.get(url)
print driver.page_source

>>> <html><head></head><body></body></html>         Obviously It's not right!!

Here's the source code I need in right click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

        jQuery(document).ready(function() {
            jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
        });

So my question is: How to get the information generated by JS?

like image 771
MacSanhe Avatar asked Mar 30 '14 02:03

MacSanhe


People also ask

Does Selenium render JavaScript?

We can get HTML with JavaScript rendered source code by using Selenium webdriver. Selenium can execute JavaScript commands with the help of the executeScript method.

How do I get HTML in Selenium?

There are 2 ways to get the HTML source of a web element using Selenium: Method #1 – Read the innerHTML attribute to get the source of the content of the element. innerHTML is a property of a DOM element whose value is the HTML that exists in between the opening tag and ending tag.


2 Answers

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file

sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html

That will get everything inside of the <html> tag

like image 158
Victory Avatar answered Oct 24 '22 18:10

Victory


It's not necessary to use that workaround, you can use instead:

driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
like image 13
Darius Avatar answered Oct 24 '22 18:10

Darius