I was wondering what's the best way to save all the files that are retrieved when Selenium visits a site. In other words, when Selenium visits http://www.google.com I want to save the HTML, JavaScript (including scripts referenced in src tags), images, and potentially content contained in iframes. How can this be done?
I know getHTMLSource() will return the HTML content in the body of the main frame, but how can this be extended to download the complete set of files necessary to render that page again. Thanks in advance!
The only built in method Selenium has for downloading source content is
driver = webdriver.Chrome()
driver.get('www.someurl.com')
page_source = driver.page_source
But that doesn't download all the images, css, and js scripts like you would get if you used ctrl+s on a webpage. So you'll need to emulate the ctr+s keys after you navigate to a webpage like Algorithmatic has stated.
I made a gist to show how thats done. https://gist.github.com/GrilledChickenThighs/211c307edf8f828806c4bb4e4707b106
# Download entire webpage including all javascript, html, css of webpage. Replicates ctrl+s when on a webpage.
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
def save_current_page():
ActionChains(browser).send_keys(Keys.CONTROL, "s").perform()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With