Take screenshot of full page with Selenium Python with chromedriver

Tags:

After trying out various approaches... I have stumbled upon this page to take full-page screenshot with chromedriver, selenium and python.

The original code is here. (and I copy the code in this posting below)

It uses PIL and it works great! However, there is one issue... which is it captures fixed headers and repeats for the whole page and also misses some parts of the page during page change. sample url to take a screenshot:

http://www.w3schools.com/js/default.asp

How to avoid the repeated headers with this code... Or is there any better option which uses python only... ( i don't know java and do not want to use java).

Please see the screenshot of the current result and sample code below.

full page screenshot with repeated headers

test.py

""" This script uses a simplified version of the one here: https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/  It contains the *crucial* correction added in the comments by Jason Coutu. """  import sys  from selenium import webdriver import unittest  import util  class Test(unittest.TestCase):     """ Demonstration: Get Chrome to generate fullscreen screenshot """      def setUp(self):         self.driver = webdriver.Chrome()      def tearDown(self):         self.driver.quit()      def test_fullpage_screenshot(self):         ''' Generate document-height screenshot '''         #url = "http://effbot.org/imagingbook/introduction.htm"         url = "http://www.w3schools.com/js/default.asp"         self.driver.get(url)         util.fullpage_screenshot(self.driver, "test.png")   if __name__ == "__main__":     unittest.main(argv=[sys.argv[0]])

util.py

import os import time  from PIL import Image  def fullpage_screenshot(driver, file):          print("Starting chrome full page screenshot workaround ...")          total_width = driver.execute_script("return document.body.offsetWidth")         total_height = driver.execute_script("return document.body.parentNode.scrollHeight")         viewport_width = driver.execute_script("return document.body.clientWidth")         viewport_height = driver.execute_script("return window.innerHeight")         print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))         rectangles = []          i = 0         while i < total_height:             ii = 0             top_height = i + viewport_height              if top_height > total_height:                 top_height = total_height              while ii < total_width:                 top_width = ii + viewport_width                  if top_width > total_width:                     top_width = total_width                  print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))                 rectangles.append((ii, i, top_width,top_height))                  ii = ii + viewport_width              i = i + viewport_height          stitched_image = Image.new('RGB', (total_width, total_height))         previous = None         part = 0          for rectangle in rectangles:             if not previous is None:                 driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))                 print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))                 time.sleep(0.2)              file_name = "part_{0}.png".format(part)             print("Capturing {0} ...".format(file_name))              driver.get_screenshot_as_file(file_name)             screenshot = Image.open(file_name)              if rectangle[1] + viewport_height > total_height:                 offset = (rectangle[0], total_height - viewport_height)             else:                 offset = (rectangle[0], rectangle[1])              print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))             stitched_image.paste(screenshot, offset)              del screenshot             os.remove(file_name)             part = part + 1             previous = rectangle          stitched_image.save(file)         print("Finishing chrome full page screenshot workaround...")         return True

436

asked Jan 18 '17 14:01

ihightower

2 Answers

This answer improves upon prior answers by am05mhz and Javed Karim.

It assumes headless mode, and that a window-size option was not initially set. Before calling this function, ensure the page has loaded fully or sufficiently.

It attempts to set the width and height both to what is necessary. The screenshot of the entire page can sometimes include a needless vertical scrollbar. One way to generally avoid the scrollbar is by taking a screenshot of the body element instead. After saving a screenshot, it reverts the size to what it was originally, failing which the size for the next screenshot may not set correctly.

Ultimately this technique may still not work perfectly well for some examples.

from selenium import webdriver  def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None:     # Ref: https://stackoverflow.com/a/52572919/     original_size = driver.get_window_size()     required_width = driver.execute_script('return document.body.parentNode.scrollWidth')     required_height = driver.execute_script('return document.body.parentNode.scrollHeight')     driver.set_window_size(required_width, required_height)     # driver.save_screenshot(path)  # has scrollbar     driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar     driver.set_window_size(original_size['width'], original_size['height'])

If using Python older than 3.6, remove the type annotations from the function definition.

126

answered Sep 20 '22 20:09

Asclepius

Screenshots are limited to the viewport but you can get around this by capturing the body element, as the webdriver will capture the entire element even if it is larger than the viewport. This will save you having to deal with scrolling and stitching images, however you might see problems with footer position (like in the screenshot below).

Tested on Windows 8 and Mac High Sierra with Chrome Driver.

from selenium import webdriver  url = 'https://stackoverflow.com/' path = '/path/to/save/in/scrape.png'  driver = webdriver.Chrome() driver.get(url) el = driver.find_element_by_tag_name('body') el.screenshot(path) driver.quit()

Returns: (full size: https://i.stack.imgur.com/ppDiI.png)

SO_scrape

answered Sep 18 '22 20:09

alexalex

Related questions
                            
                                How to create a password entry field using Tkinter
                            
                                Modifying global variables in Python unittest framework
                            
                                How do I call a function twice or more times consecutively?
                            
                                OpenCV 2.4 VideoCapture not working on Windows
                            
                                Ending an infinite while loop
                            
                                Extract email sub-strings from large document
                            
                                How to fetch more than 1000?
                            
                                Subtracting two lists in Python
                            
                                Decoding base64 from POST to use in PIL
                            
                                socket.error: [Errno 10013] An attempt was made to access a socket in a way forbidden by its access permissions
                            
                                Getting a callback when a Tkinter Listbox selection is changed?
                            
                                python, sort descending dataframe with pandas
                            
                                How can I easily determine if a Boto 3 S3 bucket resource exists?
                            
                                How to add an extra row to a pandas dataframe [duplicate]
                            
                                Python OpenCV load image from byte string
                            
                                Count number of records by date in Django
                            
                                Getting the max value of attributes from a list of objects
                            
                                Upgrade version of Pandas
                            
                                safe enough 8-character short unique random string
                            
                                how to flatten a 2D list to 1D without using numpy? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Take screenshot of full page with Selenium Python with chromedriver

Tags:

python

selenium

selenium-chromedriver

webpage-screenshot

ihightower

People also ask

2 Answers

Asclepius

alexalex

Recent Activity

Donate For Us