Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get innerHTML of whole page in selenium driver?

Tags:

selenium

I'm using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.

Somebody has shown how to get inner HTML of an element in a Selenium WebDriver. Is there a way to get HTML of the whole page? Thanks

The sample code in Python (Based on the post above, the language seems to not matter too much):

from selenium import webdriver from selenium.webdriver.support.ui import Select from bs4 import BeautifulSoup   url = 'http://www.google.com' driver = webdriver.Firefox() driver.get(url)  the_html = driver---somehow----.get_attribute('innerHTML') bs = BeautifulSoup(the_html, 'html.parser') 
like image 919
YJZ Avatar asked Mar 10 '16 00:03

YJZ


People also ask

What is getAttribute innerHTML?

We can obtain the innerHTML attribute to get the HTML content of the web element. The innerHTML is an attribute of a webelement which is equal to the content that is present between the starting and ending tag. The getAttribute method is used for this and innerHTML is passed as an argument to the method.

How do I get the HTML code for a website using Selenium?

To get the HTML source of a WebElement in Selenium WebDriver, we can use the get_attribute method of the Selenium Python WebDriver. First, we grab the HTML WebElement using driver element locator methods like (find_element_by_xpath or find_element_by_css_selector).


1 Answers

To get the HTML for the whole page:

from selenium import webdriver  driver = webdriver.Firefox() driver.get("http://stackoverflow.com")  html = driver.page_source 

To get the outer HTML (tag included):

# HTML from `<html>` html = driver.execute_script("return document.documentElement.outerHTML;")  # HTML from `<body>` html = driver.execute_script("return document.body.outerHTML;")  # HTML from element with some JavaScript element = driver.find_element_by_css_selector("#hireme") html = driver.execute_script("return arguments[0].outerHTML;", element)  # HTML from element with `get_attribute` element = driver.find_element_by_css_selector("#hireme") html = element.get_attribute('outerHTML') 

To get the inner HTML (tag excluded):

# HTML from `<html>` html = driver.execute_script("return document.documentElement.innerHTML;")  # HTML from `<body>` html = driver.execute_script("return document.body.innerHTML;")  # HTML from element with some JavaScript element = driver.find_element_by_css_selector("#hireme") html = driver.execute_script("return arguments[0].innerHTML;", element)  # HTML from element with `get_attribute` element = driver.find_element_by_css_selector("#hireme") html = element.get_attribute('innerHTML') 
like image 173
Florent B. Avatar answered Oct 03 '22 23:10

Florent B.