Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to disable Images/CSS in Pyppeteer?

How to disable images/CSS in Puppeteer? I've seen this tutorial https://www.scrapehero.com/how-to-increase-web-scraping-speed-using-puppeteer/ but I don't know how to translate it to Python

like image 599
Leo Avatar asked Dec 24 '19 16:12

Leo


People also ask

Why are my screenshots not showing up in puppeteer?

For instance, when you avoid loading images, the screenshots won’t appear as you imagined. Puppeteer only works with Chrome and Chromium. For automating other browsers you might want to try the Selenium framework.

How does puppeteer work with Google Chrome?

Since Puppeteer gives full control over the Chrome browser, we can provide an interceptor on every request and cancel the ones we don’t require. For scraping, we don’t really need to worry about any visuals, including the images so we will check each request made by Chrome and block the ones with images and CSS resources.

What is puppeteer and how to use it?

Puppeteer allows blocking any outgoing requests while loading the page. Whether you want to block ads, tracking scripts, or different types of resources, it is relatively easy to do with Puppeteer. If you want to speed up scrapping or make screenshots faster, you can disable all the requests that do not make any crucial impact on the results.

How to execute puppeteer script in Headless browsers?

To execute the puppeteer script save the code inside the directory created and run the script as The general idea is to not let the headless browser run any command that doesn’t help with the scraping. This includes loading images, CSS and fonts.


2 Answers

Based on example from https://github.com/miyakogi/pyppeteer/blob/dev/pyppeteer/page.py#L312:

await page.setRequestInterception(True)
async def intercept(request):
    if any(request.resourceType == _ for _ in ('stylesheet', 'image', 'font')):
        await request.abort()
    else:
        await request.continue_()
page.on('request', lambda req: asyncio.ensure_future(intercept(req)))
like image 107
mbit Avatar answered Nov 15 '22 07:11

mbit


This below code will disable resource by type: fetch, image, media, and font.

    page.setRequestInterception(true)

    page.on ( 'request', async request => {
        if ( request.resourceType () === 'fetch' || request.resourceType () === 'image' || request.resourceType () === 'media' || request.resourceType () === 'font' ) {
            request.abort ()
        } else {
            request.continue ()
        }
    })
like image 27
Edi Imanto Avatar answered Nov 15 '22 06:11

Edi Imanto