Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving full page content using Selenium

Tags:

selenium

I was wondering what's the best way to save all the files that are retrieved when Selenium visits a site. In other words, when Selenium visits http://www.google.com I want to save the HTML, JavaScript (including scripts referenced in src tags), images, and potentially content contained in iframes. How can this be done?

I know getHTMLSource() will return the HTML content in the body of the main frame, but how can this be extended to download the complete set of files necessary to render that page again. Thanks in advance!

like image 834
Rick Avatar asked Jun 15 '10 22:06

Rick


1 Answers

The only built in method Selenium has for downloading source content is

driver = webdriver.Chrome()
driver.get('www.someurl.com')
page_source = driver.page_source

But that doesn't download all the images, css, and js scripts like you would get if you used ctrl+s on a webpage. So you'll need to emulate the ctr+s keys after you navigate to a webpage like Algorithmatic has stated.

I made a gist to show how thats done. https://gist.github.com/GrilledChickenThighs/211c307edf8f828806c4bb4e4707b106

# Download entire webpage including all javascript, html, css of webpage. Replicates ctrl+s when on a webpage.

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

def save_current_page():      
    ActionChains(browser).send_keys(Keys.CONTROL, "s").perform()
like image 73
pAulseperformance Avatar answered Nov 01 '22 13:11

pAulseperformance