I'm using selenium
to click to the web page I want, and then parse the web page using Beautiful Soup
.
Somebody has shown how to get inner HTML of an element in a Selenium WebDriver
. Is there a way to get HTML of the whole page? Thanks
The sample code in Python
(Based on the post above, the language seems to not matter too much):
from selenium import webdriver from selenium.webdriver.support.ui import Select from bs4 import BeautifulSoup url = 'http://www.google.com' driver = webdriver.Firefox() driver.get(url) the_html = driver---somehow----.get_attribute('innerHTML') bs = BeautifulSoup(the_html, 'html.parser')
We can obtain the innerHTML attribute to get the HTML content of the web element. The innerHTML is an attribute of a webelement which is equal to the content that is present between the starting and ending tag. The getAttribute method is used for this and innerHTML is passed as an argument to the method.
To get the HTML source of a WebElement in Selenium WebDriver, we can use the get_attribute method of the Selenium Python WebDriver. First, we grab the HTML WebElement using driver element locator methods like (find_element_by_xpath or find_element_by_css_selector).
To get the HTML for the whole page:
from selenium import webdriver driver = webdriver.Firefox() driver.get("http://stackoverflow.com") html = driver.page_source
To get the outer HTML (tag included):
# HTML from `<html>` html = driver.execute_script("return document.documentElement.outerHTML;") # HTML from `<body>` html = driver.execute_script("return document.body.outerHTML;") # HTML from element with some JavaScript element = driver.find_element_by_css_selector("#hireme") html = driver.execute_script("return arguments[0].outerHTML;", element) # HTML from element with `get_attribute` element = driver.find_element_by_css_selector("#hireme") html = element.get_attribute('outerHTML')
To get the inner HTML (tag excluded):
# HTML from `<html>` html = driver.execute_script("return document.documentElement.innerHTML;") # HTML from `<body>` html = driver.execute_script("return document.body.innerHTML;") # HTML from element with some JavaScript element = driver.find_element_by_css_selector("#hireme") html = driver.execute_script("return arguments[0].innerHTML;", element) # HTML from element with `get_attribute` element = driver.find_element_by_css_selector("#hireme") html = element.get_attribute('innerHTML')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With