Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pupeteer - how can I accept cookie consent prompts automatically for any URL?

Tags:

When taking a screenshot of a website using pupeteer, cookie consent prompts are displayed. I want to dismiss or accept these prompts before taking the screenshot. The problem I am facing is that most websites present the cookie prompt in different ways, so its difficult to isolate them.

How can I best target and dismiss these prompts using pupeteer?

like image 850
drs Avatar asked Jan 06 '20 20:01

drs


2 Answers

I don't believe there is a general way of doing this as these prompts are elements like every other elements in the page. Having said that, there are some attempts to block them with extensions or filter-lists you can try:

  • https://www.i-dont-care-about-cookies.eu/
  • http://prebake.eu/

I haven't tested any of these and do not know whether they're effective.

keep in mind headless chrome doesn't support extension. Loading extensions in puppeteer:

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--disable-extensions-except=/path/to/manifest/folder/',
    '--load-extension=/path/to/manifest/folder/',
  ]
});
like image 171
mbit Avatar answered Oct 02 '22 15:10

mbit


Update: A more general way to combat cookie consents with headless puppeteer

This approach is nowhere near complete either, but shows an efficient way to eliminate cookie consent pop-ups in a less specific way. It uses language and generalized selectors to detect consent buttons and links rather than solely relying on exact selectors for each website.

In this following example I am targeting the elements a, button that are within a container that uses the name cookie within an id, class. I limit buttons in this context, so I won't randomly click around the website by accident.

Furthermore it uses regular expressions to identify button text which is commonly used to accept cookies and can be replaced with ^(Accept all|Accept|I understand|Agree|Okay|OK)$ or translated into any language of your choice (case-insensitive).

await page.evaluate(_ => {
    function xcc_contains(selector, text) {
        var elements = document.querySelectorAll(selector);
        return Array.prototype.filter.call(elements, function(element){
            return RegExp(text, "i").test(element.textContent.trim());
        });
    }
    var _xcc;
    _xcc = xcc_contains('[id*=cookie] a, [class*=cookie] a, [id*=cookie] button, [class*=cookie] button', '^(Alle akzeptieren|Akzeptieren|Verstanden|Zustimmen|Okay|OK)$');
    if (_xcc != null && _xcc.length != 0) { _xcc[0].click(); }
});

Old Answer:

There is indeed no general way to handle cookie consent pop-ups, as they vary greatly, and even the chrome extensions won't handle all. However, you can replicate what the extensions do and manage your own list, by evaluating JS code on the target site before taking a screenshot.

In my case I just accept them all, trying to do it in headless mode. Add more selectors as you identify them. You could use dismiss button selectors instead, if you wish so.

Following you will find some real world scenarios that should help to get you going:

  • handle ids, classes and custom data-attributes
  • hide iframes, as code on a different domains cannot be evaluated
await page.evaluate(_ => {
    var xcc
    // ids
    var xcc_id = [
        'borlabsCookieOptionAll',
        'cookie-apply-all',
        'cookie-settings-all',
        // add ids here
    ];
    for (let i = 0; i < xcc_id.length; i++) {
        xcc = document.getElementById(xcc_id[i]);
        if (xcc != null) {
            xcc.click();
        }
    }
    // classes
    var xcc_class = [
        'accept-all',
        'accept-cookies-button',
        'avia-cookie-select-all',
        // add classes here
    ];
    for (let i = 0; i < xcc_class.length; i++) {
        xcc = document.getElementsByClassName(xcc_class[i]);
        if (xcc != null && xcc.length != 0) {
            xcc[0].click();
        }
    }

    // custom data attributes
    xcc = document.querySelectorAll('[data-cookieman-accept-all]'); if (xcc != null && xcc.length != 0) { xcc[0].click(); }

     // hide iframes, can't eval
    xcc = document.querySelectorAll("iframe[src*=eurocookie]"); if (xcc != null && xcc.length != 0) { xcc[0].style.display = 'none'; }
    xcc = document.querySelectorAll("iframe[src*=eurocookie]"); if (xcc != null && xcc.length > 1) { xcc[1].style.display = 'none'; }

});

There sure is a more elegant way of doing this, but this way I was able to quickly organize my list, make changes on the fly, sorting and removing duplicates in the code editor by keeping them as a one-liner or in arrays.

Alternatively just use the { headless: false } option and load an extension that does it for you as suggested. Cheers.

Side note: Interaction with cookie consent pop-ups can cause your code to break if the page reloads (page navigation error). To circumvent this, I use a fixed time delay of 3000-4000 ms after await page.evaluate( ... );

const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
await delay(3500);

which also catches plenty of meta-refreshes, JS redirects and gives some extra time for large resources to load.

like image 44
mountarreat Avatar answered Oct 02 '22 14:10

mountarreat