Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cache playwright-python contexts for testing?

I am doing some web scraping using playwright-python>=1.41, and have to launch the browser in a headed mode (e.g. launch(headless=False).

For CI testing, I would like to somehow cache the headed interactions with Chromium, to enable offline testing:

  • First invocation: uses Chromium to make real-world HTTP transactions
  • Later invocations: uses Chromium, but all HTTP transactions read from a cache

How can this be done? I can't find any clear answers on how to do this.

like image 308
Intrastellar Explorer Avatar asked Feb 05 '26 00:02

Intrastellar Explorer


2 Answers

It might solve your problem using HAR-file recording:

  1. Run the first test while recording a HAR-file
  2. Storing the HAR-file as an artifact, in your repo or similar in your CI environment
  3. Running test again with recorded HAR-file

Here is how to do that with playwright==1.41.1 and pytest-playwright==0.3.3:

import pathlib

import pytest
from playwright.sync_api import Browser, Playwright

CACHE_DIR = pathlib.Path(__file__).parent / "cache"


@pytest.fixture(name="example_har", scope="session")
def fixture_example_har(playwright: Playwright) -> pathlib.Path:
    har_file = CACHE_DIR / "example.har"
    with (
        playwright.chromium.launch(headless=False) as browser,
        browser.new_page() as page,
    ):
        page.route_from_har(har_file, url="*/**", update=True)
        page.goto("https://example.com/")
    return har_file


def test_caching(browser: Browser, example_har: pathlib.Path) -> None:
    with browser.new_context(offline=True) as context:
        page = context.new_page()
        page.route_from_har(example_har, url="*/**")
        page.goto("https://example.com/")
like image 186
candre Avatar answered Feb 07 '26 15:02

candre


Use the set_extra_http_headers() and set_offline() methods to cache headed interactions and also launch the browser with a specific cache directory so the same cache is used across multiple invocations of your script.

from playwright.sync_api import sync_playwright

browser.context.set_offline(True)

browser = sync_playwright().chromium.launch(
    headless=False,
    chromium_sandbox=False,
    args=["--disk-cache-dir=/path/to/cache"],
)

browser.context.set_extra_http_headers({"Cache-Control": "max-age=31536000"})

# Do web scraping here

browser.close()

Here the error playwright._impl._errors.Error: net::ERR_INTERNET_DISCONNECTED says that browser is still trying to make network requests even though you have set the context to offline.

This may occur when your web scraping code is still trying to access external resources, such as images or stylesheets.

make sure that your web scraping code only interacts with the cached content.

Try using the networkIdleTimeout option when setting the context to offline. This will wait for all network requests to complete before setting the context to offline

browser.context.set_offline(True, network_idle_timeout=5000)
like image 35
naved196 Avatar answered Feb 07 '26 14:02

naved196