After trying out various approaches... I have stumbled upon this page to take full-page screenshot with chromedriver, selenium and python.
The original code is here. (and I copy the code in this posting below)
It uses PIL and it works great! However, there is one issue... which is it captures fixed headers and repeats for the whole page and also misses some parts of the page during page change. sample url to take a screenshot:
http://www.w3schools.com/js/default.asp
How to avoid the repeated headers with this code... Or is there any better option which uses python only... ( i don't know java and do not want to use java).
Please see the screenshot of the current result and sample code below.
test.py
""" This script uses a simplified version of the one here: https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/ It contains the *crucial* correction added in the comments by Jason Coutu. """ import sys from selenium import webdriver import unittest import util class Test(unittest.TestCase): """ Demonstration: Get Chrome to generate fullscreen screenshot """ def setUp(self): self.driver = webdriver.Chrome() def tearDown(self): self.driver.quit() def test_fullpage_screenshot(self): ''' Generate document-height screenshot ''' #url = "http://effbot.org/imagingbook/introduction.htm" url = "http://www.w3schools.com/js/default.asp" self.driver.get(url) util.fullpage_screenshot(self.driver, "test.png") if __name__ == "__main__": unittest.main(argv=[sys.argv[0]])
util.py
import os import time from PIL import Image def fullpage_screenshot(driver, file): print("Starting chrome full page screenshot workaround ...") total_width = driver.execute_script("return document.body.offsetWidth") total_height = driver.execute_script("return document.body.parentNode.scrollHeight") viewport_width = driver.execute_script("return document.body.clientWidth") viewport_height = driver.execute_script("return window.innerHeight") print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height)) rectangles = [] i = 0 while i < total_height: ii = 0 top_height = i + viewport_height if top_height > total_height: top_height = total_height while ii < total_width: top_width = ii + viewport_width if top_width > total_width: top_width = total_width print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height)) rectangles.append((ii, i, top_width,top_height)) ii = ii + viewport_width i = i + viewport_height stitched_image = Image.new('RGB', (total_width, total_height)) previous = None part = 0 for rectangle in rectangles: if not previous is None: driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1])) print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1])) time.sleep(0.2) file_name = "part_{0}.png".format(part) print("Capturing {0} ...".format(file_name)) driver.get_screenshot_as_file(file_name) screenshot = Image.open(file_name) if rectangle[1] + viewport_height > total_height: offset = (rectangle[0], total_height - viewport_height) else: offset = (rectangle[0], rectangle[1]) print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1])) stitched_image.paste(screenshot, offset) del screenshot os.remove(file_name) part = part + 1 previous = rectangle stitched_image.save(file) print("Finishing chrome full page screenshot workaround...") return True
To take a screenshot in Selenium, we use an interface called TakesScreenshot, which enables the Selenium WebDriver to capture a screenshot and store it in different ways. It has a got a method "getScreenshotAs() " which captures the screenshot and store it in the specified location.
You can take a screenshot of a webpage with the method get_screenshot_as_file() with as parameter the filename. The program below uses firefox to load a webpage and take a screenshot, but any web browser will do. The screenshot image will be stored in the same directory as your Python script.
Selenium offers a lot of features and one of the important and useful feature is of taking a screenshot. In order to take a screenshot of webpage save_screenshot() method is used. save_screenshot method allows user to save the webpage as a png file.
This answer improves upon prior answers by am05mhz and Javed Karim.
It assumes headless mode, and that a window-size option was not initially set. Before calling this function, ensure the page has loaded fully or sufficiently.
It attempts to set the width and height both to what is necessary. The screenshot of the entire page can sometimes include a needless vertical scrollbar. One way to generally avoid the scrollbar is by taking a screenshot of the body element instead. After saving a screenshot, it reverts the size to what it was originally, failing which the size for the next screenshot may not set correctly.
Ultimately this technique may still not work perfectly well for some examples.
from selenium import webdriver def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None: # Ref: https://stackoverflow.com/a/52572919/ original_size = driver.get_window_size() required_width = driver.execute_script('return document.body.parentNode.scrollWidth') required_height = driver.execute_script('return document.body.parentNode.scrollHeight') driver.set_window_size(required_width, required_height) # driver.save_screenshot(path) # has scrollbar driver.find_element_by_tag_name('body').screenshot(path) # avoids scrollbar driver.set_window_size(original_size['width'], original_size['height'])
If using Python older than 3.6, remove the type annotations from the function definition.
Screenshots are limited to the viewport but you can get around this by capturing the body
element, as the webdriver will capture the entire element even if it is larger than the viewport. This will save you having to deal with scrolling and stitching images, however you might see problems with footer position (like in the screenshot below).
Tested on Windows 8 and Mac High Sierra with Chrome Driver.
from selenium import webdriver url = 'https://stackoverflow.com/' path = '/path/to/save/in/scrape.png' driver = webdriver.Chrome() driver.get(url) el = driver.find_element_by_tag_name('body') el.screenshot(path) driver.quit()
Returns: (full size: https://i.stack.imgur.com/ppDiI.png)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With