Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Website access denied using puppeteer on cloud functions

I am trying to scape this url https://www.myntra.com/laptop-bag/chumbak/chumbak-unisex-brown-geo-bird--printed-laptop-bag/6795882/buy using puppeteer. It's working when i use { headless: false }, but failing in headless mode.

Then i have compared response in both cases using this.

const resp = await page.goto(url);
console.log(resp);

Then i figured out that we need to add userAgent when using headless mode. so i have added this.

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');

Now it is working in both cases locally. But when i deploy to cloud function, it is still failing.

This is the screenshot taken using puppeteer. enter image description here

this is some part of the response log.

_headers: 
   { status: '403',
     server: 'AkamaiGHost',
     'mime-version': '1.0',
     'content-type': 'text/html',
     'content-length': '395',
     expires: 'Thu, 09 Jul 2020 12:16:30 GMT',
     date: 'Thu, 09 Jul 2020 12:16:30 GMT',
     'set-cookie': 'AKA_A2=A; expires=Thu, 09-Jul-2020 13:16:30 GMT........

Am i missing anything?

Thanks.

update:

I have used puppeteer stealth plugin along with IP rotation. here is the code

const puppeteer = require('puppeteer-extra');

const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())

const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker')
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))

And for IP rotation:

var browser = await puppeteer.launch({
           headless: true,
           args: ['--proxy-server=abcd-efg.proxymesh.com:12345']
         });

var page = await browser.newPage();

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');

await page.authenticate({
          username: 'myusername',
          password: 'mypassword'
        });

IP rotation working locally but still blocked on cloud function.

like image 795
vjnan369 Avatar asked Jul 09 '20 16:07

vjnan369


People also ask

How much RAM does puppeteer need?

Memory requirements Actors using Puppeteer: at least 1GB of memory. Large and complex sites like Google Maps: at least 4GB for optimal speed and concurrency.

Does puppeteer work with Chrome?

Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.

How do I open a browser using puppeteer?

To use Puppeteer with a different version of Chrome or Chromium, pass in the executable's path when creating a Browser instance: const browser = await puppeteer.


1 Answers

Using residential proxies fixed the issue.

Initially I have deployed in cloud function and AWS lambda with IP rotation. I have used proxymesh service for IP rotation. but it provides data center proxies only. It was failed. Then i tried with residential proxies from another service. It worked.

like image 178
vjnan369 Avatar answered Oct 17 '22 18:10

vjnan369