Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer: how to store a session (including cookies, page state, local storage, etc) and continue later?

Is it possible to have a Puppeteer script that opens and interacts with a page, and then saves that browser sessions as-is, and have another script load that and continue from there?

By "browser session" I mean the currently loaded page including the page state (DOM space and javascript variables etc), cookies, local storage, the whole shebang. Basically everything it needs to continue exactly where the previous script left off.

If not, then is it possible to at least export and import cookies and local storage? So I can reload a particular page and continue processing, keeping any login or session data intact.

like image 804
RocketNuts Avatar asked Sep 18 '19 07:09

RocketNuts


People also ask

How do puppeteer and playwright use localStorage?

Just like with cookies, Puppeteer and Playwright make accessing localStorage and sessionStorage straightforward. Our test site, Danube, actually uses localStorage to keep track of a few things, such as the content of your cart. Let's see how we can access this state and then replicate it in a later session.

How to share cookies between puppeteer scripts?

Example: puppeteer.launch ( { userDataDir: '/tmp/myChromeSession' });. Every puppeteer script that use this will use the same browser, so they will share the "permanent" cookies. The "session" cookies (or the ones that have an expiration time) sure get deleted, but this is the way that cookies are supposed to work.

How do I clear cookies in a puppeteer or playwright session?

While brand new browser sessions with both Puppeteer and Playwright will not contain any cookies by default, there might be points when it is necessary to clear them. In case you need to clear cookies, you can use page.deleteCookie (...cookies) with Puppeteer and browserContext.clearCookies () with Playwright.

What is session storage and how to use it?

Session storage is a popular choice when it comes to storing data on a browser. It enables developers to save and retrieve different values. Unlike local storage, session storage only keeps data for a particular session. The data is cleared once the user closes the browser window. Session storage is a perfect alternative to cookies.


2 Answers

I can't say for sure, but since Puppeteer is "just" a wrapper for Chrome DevTools Protocol (cdp), and cpd doesn't have a native "command" that does what you are asking for, it's not possible to do it for the whole shebang.

But you have options. One good option is to reutilize the same browser for the next script. You just need to pass the "userDataDir" option to puppeteer.launch command. Example: puppeteer.launch({ userDataDir: '/tmp/myChromeSession' });. Every puppeteer script that use this will use the same browser, so they will share the "permanent" cookies. The "session" cookies (or the ones that have an expiration time) sure get deleted, but this is the way that cookies are supposed to work.

Excerpt about User Data Directory:

The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.

Despite this reference don't write nothing about Web Storage, it is preserved on the User Data Directory too. So, using this option you are good to go. I think is the best option for your case.

You have other options too, like copy just the cookies and Storage (localStorage and sessionStorage).

Copying cookies using puppeteer

With puppeteer, this process is very painful: you have to specify every origin you want to coope the cookies from. For example, if your site embed third-party things, like google signin or tracking, you have to copy cookies from "google.com", ".google.com", "www.google.com", etc. It's very very dumb and painful. Anyway, to copy cookies origin https://a.b.c, issue: const abcCookies = await page.cookies('https://a.b.c'); To restore them: await page.setCookie(...abcCookies);. Since they are json, you can serialize them and save to disk, to restore later.

Copying cookies using CDP

let { cookies } = await page._client.send('Network.getAllCookies');

Reference: Network.getAllCookies

To restore them, you use the Network.setCookies cdp method. Again, you can serialize those cookies and save to disk to restore later.

Copying Storage (localStorage and sessionStorage)

You can transfer you own origin Storage via const ls = await page.evaluate(() => JSON.stringify(localStorage)); and const ss = await page.evaluate(() => JSON.stringify(sessionStorage));. However you can't access other origins Storages for security reasons. Don't know CDP equivalent and think it doesn't exist yet.

Web Cache

If your site has a service worker, chances are that it save things on Web Cache API. I don't know if it make any sense to save this cached data, but if is important to you, you can transfer these cache too, but not using puppeteer apis or cdp. You have to use the Cache api by yourself and transfer the cache using page.evaluate.

IndexedDB

If you want to copy IndexedDB contents, you can use the cdp IndexedDB domain methods (like "IndexedDB.requestData") to get the data for any origin, but you can't set/restore this data. :) You can however, in your own origin, restore the data programatically using page.evaluate.

like image 190
lcrespilho Avatar answered Sep 20 '22 10:09

lcrespilho


The answer of Icrespilho is very valuable. He leaves two exercises for the reader and I have done one: IndexedDB.

Copying IndexedDB

He writes:

If you want to copy IndexedDB contents, you can use the cdp IndexedDB domain methods (like "IndexedDB.requestData") to get the data for any origin, but you can't set/restore this data. :) You can however, in your own origin, restore the data programatically using page.evaluate.

I have worked out reading the data as:

const indexedDB = await page.evaluate(async () => {
  const result = {};
  const databases = await window.indexedDB.databases();

  const connect = (database) => new Promise(function (resolve, _) {
    const request = window.indexedDB.open(database.name, database.version);
    request.onsuccess = _ => resolve(request.result);
  });

  const getAll = (db, objectStoreName) => new Promise(function (resolve, _) {
    const request = db.transaction([objectStoreName]).objectStore(objectStoreName).getAll();
    request.onsuccess = _ => resolve(request.result);
  });

  for (i = 0; i < databases.length; i++) {
    const db = await connect(databases[i])
    const dbName = db.name;
    result[dbName] = {}
    for (j = 0; j < db.objectStoreNames.length; j++) {
      const objectStoreName = db.objectStoreNames[j];
      result[dbName][objectStoreName] = []
      const values = await getAll(db, objectStoreName);
      result[dbName][objectStoreName] = values;
    }

  }
  return result;
});

I hope this helps anyone.

like image 43
mevdschee Avatar answered Sep 20 '22 10:09

mevdschee