How to cache playwright-python contexts for testing?

Question

I am doing some web scraping using playwright-python>=1.41, and have to launch the browser in a headed mode (e.g. launch(headless=False).

For CI testing, I would like to somehow cache the headed interactions with Chromium, to enable offline testing:

First invocation: uses Chromium to make real-world HTTP transactions
Later invocations: uses Chromium, but all HTTP transactions read from a cache

How can this be done? I can't find any clear answers on how to do this.

candre · Accepted Answer

It might solve your problem using HAR-file recording:

Run the first test while recording a HAR-file
Storing the HAR-file as an artifact, in your repo or similar in your CI environment
Running test again with recorded HAR-file

Here is how to do that with playwright==1.41.1 and pytest-playwright==0.3.3:

import pathlib

import pytest
from playwright.sync_api import Browser, Playwright

CACHE_DIR = pathlib.Path(__file__).parent / "cache"


@pytest.fixture(name="example_har", scope="session")
def fixture_example_har(playwright: Playwright) -> pathlib.Path:
    har_file = CACHE_DIR / "example.har"
    with (
        playwright.chromium.launch(headless=False) as browser,
        browser.new_page() as page,
    ):
        page.route_from_har(har_file, url="*/**", update=True)
        page.goto("https://example.com/")
    return har_file


def test_caching(browser: Browser, example_har: pathlib.Path) -> None:
    with browser.new_context(offline=True) as context:
        page = context.new_page()
        page.route_from_har(example_har, url="*/**")
        page.goto("https://example.com/")

naved196 · Answer

Use the set_extra_http_headers() and set_offline() methods to cache headed interactions and also launch the browser with a specific cache directory so the same cache is used across multiple invocations of your script.

from playwright.sync_api import sync_playwright

browser.context.set_offline(True)

browser = sync_playwright().chromium.launch(
    headless=False,
    chromium_sandbox=False,
    args=["--disk-cache-dir=/path/to/cache"],
)

browser.context.set_extra_http_headers({"Cache-Control": "max-age=31536000"})

# Do web scraping here

browser.close()

Here the error playwright._impl._errors.Error: net::ERR_INTERNET_DISCONNECTED says that browser is still trying to make network requests even though you have set the context to offline.

This may occur when your web scraping code is still trying to access external resources, such as images or stylesheets.

make sure that your web scraping code only interacts with the cached content.

Try using the networkIdleTimeout option when setting the context to offline. This will wait for all network requests to complete before setting the context to offline

browser.context.set_offline(True, network_idle_timeout=5000)

How to cache playwright-python contexts for testing?

Tags:

python

caching

playwright

playwright-python

Intrastellar Explorer

2 Answers

candre

naved196

Recent Activity

Donate For Us

How to cache playwright-python contexts for testing?

Tags:

python

caching

playwright

playwright-python

Intrastellar Explorer

2 Answers

candre

naved196

Related questions

Recent Activity

Donate For Us