Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Take screenshot of full page with Selenium Python with chromedriver

After trying out various approaches... I have stumbled upon this page to take full-page screenshot with chromedriver, selenium and python.

The original code is here. (and I copy the code in this posting below)

It uses PIL and it works great! However, there is one issue... which is it captures fixed headers and repeats for the whole page and also misses some parts of the page during page change. sample url to take a screenshot:

http://www.w3schools.com/js/default.asp

How to avoid the repeated headers with this code... Or is there any better option which uses python only... ( i don't know java and do not want to use java).

Please see the screenshot of the current result and sample code below.

full page screenshot with repeated headers

test.py

""" This script uses a simplified version of the one here: https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/  It contains the *crucial* correction added in the comments by Jason Coutu. """  import sys  from selenium import webdriver import unittest  import util  class Test(unittest.TestCase):     """ Demonstration: Get Chrome to generate fullscreen screenshot """      def setUp(self):         self.driver = webdriver.Chrome()      def tearDown(self):         self.driver.quit()      def test_fullpage_screenshot(self):         ''' Generate document-height screenshot '''         #url = "http://effbot.org/imagingbook/introduction.htm"         url = "http://www.w3schools.com/js/default.asp"         self.driver.get(url)         util.fullpage_screenshot(self.driver, "test.png")   if __name__ == "__main__":     unittest.main(argv=[sys.argv[0]]) 

util.py

import os import time  from PIL import Image  def fullpage_screenshot(driver, file):          print("Starting chrome full page screenshot workaround ...")          total_width = driver.execute_script("return document.body.offsetWidth")         total_height = driver.execute_script("return document.body.parentNode.scrollHeight")         viewport_width = driver.execute_script("return document.body.clientWidth")         viewport_height = driver.execute_script("return window.innerHeight")         print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))         rectangles = []          i = 0         while i < total_height:             ii = 0             top_height = i + viewport_height              if top_height > total_height:                 top_height = total_height              while ii < total_width:                 top_width = ii + viewport_width                  if top_width > total_width:                     top_width = total_width                  print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))                 rectangles.append((ii, i, top_width,top_height))                  ii = ii + viewport_width              i = i + viewport_height          stitched_image = Image.new('RGB', (total_width, total_height))         previous = None         part = 0          for rectangle in rectangles:             if not previous is None:                 driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))                 print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))                 time.sleep(0.2)              file_name = "part_{0}.png".format(part)             print("Capturing {0} ...".format(file_name))              driver.get_screenshot_as_file(file_name)             screenshot = Image.open(file_name)              if rectangle[1] + viewport_height > total_height:                 offset = (rectangle[0], total_height - viewport_height)             else:                 offset = (rectangle[0], rectangle[1])              print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))             stitched_image.paste(screenshot, offset)              del screenshot             os.remove(file_name)             part = part + 1             previous = rectangle          stitched_image.save(file)         print("Finishing chrome full page screenshot workaround...")         return True 
like image 436
ihightower Avatar asked Jan 18 '17 14:01

ihightower


People also ask

How do I take a screenshot of a whole page in Selenium?

To take a screenshot in Selenium, we use an interface called TakesScreenshot, which enables the Selenium WebDriver to capture a screenshot and store it in different ways. It has a got a method "getScreenshotAs() " which captures the screenshot and store it in the specified location.

How do you take a screenshot of a whole page in Python?

You can take a screenshot of a webpage with the method get_screenshot_as_file() with as parameter the filename. The program below uses firefox to load a webpage and take a screenshot, but any web browser will do. The screenshot image will be stored in the same directory as your Python script.

How do I take a screenshot using python Selenium?

Selenium offers a lot of features and one of the important and useful feature is of taking a screenshot. In order to take a screenshot of webpage save_screenshot() method is used. save_screenshot method allows user to save the webpage as a png file.


2 Answers

This answer improves upon prior answers by am05mhz and Javed Karim.

It assumes headless mode, and that a window-size option was not initially set. Before calling this function, ensure the page has loaded fully or sufficiently.

It attempts to set the width and height both to what is necessary. The screenshot of the entire page can sometimes include a needless vertical scrollbar. One way to generally avoid the scrollbar is by taking a screenshot of the body element instead. After saving a screenshot, it reverts the size to what it was originally, failing which the size for the next screenshot may not set correctly.

Ultimately this technique may still not work perfectly well for some examples.

from selenium import webdriver  def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None:     # Ref: https://stackoverflow.com/a/52572919/     original_size = driver.get_window_size()     required_width = driver.execute_script('return document.body.parentNode.scrollWidth')     required_height = driver.execute_script('return document.body.parentNode.scrollHeight')     driver.set_window_size(required_width, required_height)     # driver.save_screenshot(path)  # has scrollbar     driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar     driver.set_window_size(original_size['width'], original_size['height']) 

If using Python older than 3.6, remove the type annotations from the function definition.

like image 126
Asclepius Avatar answered Sep 20 '22 20:09

Asclepius


Screenshots are limited to the viewport but you can get around this by capturing the body element, as the webdriver will capture the entire element even if it is larger than the viewport. This will save you having to deal with scrolling and stitching images, however you might see problems with footer position (like in the screenshot below).

Tested on Windows 8 and Mac High Sierra with Chrome Driver.

from selenium import webdriver  url = 'https://stackoverflow.com/' path = '/path/to/save/in/scrape.png'  driver = webdriver.Chrome() driver.get(url) el = driver.find_element_by_tag_name('body') el.screenshot(path) driver.quit() 

Returns: (full size: https://i.stack.imgur.com/ppDiI.png)

SO_scrape

like image 45
alexalex Avatar answered Sep 18 '22 20:09

alexalex