Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer Bright Data proxy returning ERR_NO_SUPPORTED_PROXY or CERT errors

So I went on Bright Data, made an account, and got my Search Engine Crawler proxy. Here's my scraping function below:

async function scrape() {
  try {
    const preparePageForTests = async (page) => {

          const userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36';//'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36';

          await page.setUserAgent(userAgent);

          await page.evaluateOnNewDocument(() => {
            Object.defineProperty(navigator, 'webdriver', {
              get: () => false,
            });
          });

          // Pass the Chrome Test.
          await page.evaluateOnNewDocument(() => {
            // We can mock this in as much depth as we need for the test.
            window.navigator.chrome = {
              app: {
                isInstalled: false,
              },
              webstore: {
                onInstallStageChanged: {},
                onDownloadProgress: {},
              },
              runtime: {
                PlatformOs: {
                  MAC: 'mac',
                  WIN: 'win',
                  ANDROID: 'android',
                  CROS: 'cros',
                  LINUX: 'linux',
                  OPENBSD: 'openbsd',
                },
                PlatformArch: {
                  ARM: 'arm',
                  X86_32: 'x86-32',
                  X86_64: 'x86-64',
                },
                PlatformNaclArch: {
                  ARM: 'arm',
                  X86_32: 'x86-32',
                  X86_64: 'x86-64',
                },
                RequestUpdateCheckStatus: {
                  THROTTLED: 'throttled',
                  NO_UPDATE: 'no_update',
                  UPDATE_AVAILABLE: 'update_available',
                },
                OnInstalledReason: {
                  INSTALL: 'install',
                  UPDATE: 'update',
                  CHROME_UPDATE: 'chrome_update',
                  SHARED_MODULE_UPDATE: 'shared_module_update',
                },
                OnRestartRequiredReason: {
                  APP_UPDATE: 'app_update',
                  OS_UPDATE: 'os_update',
                  PERIODIC: 'periodic',
                },
              }
            };
          });

          await page.evaluateOnNewDocument(() => {
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
              parameters.name === 'notifications' ?
                Promise.resolve({ state: Notification.permission }) :
                originalQuery(parameters)
            );
          });

          await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
              // This just needs to have `length > 0` for the current test,
              // but we could mock the plugins too if necessary.
              get: () => [1, 2, 3, 4, 5],
            });
          });

          await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
              get: () => ['en-US', 'en'],
            });
          });
        }

        //the below is the Search Engine Crawler proxy used from the luminati/bright data sign up. This returns ERR_CERT_INVALID or ERR_CERT_AUTHORITY_INVALID
        const oldProxyUrl = 'http://lum-customer-customerID-zone-zone1:[email protected]:22225'
        const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl); //if this line is commented out, I get the ERR_NO_SUPPORTED_PROXY

        const browser = await puppeteerExtra.launch({ 
          headless: true, 
          args: [                
            '--no-sandbox', 
            '--disable-setuid-sandbox', 
            `--proxy-server=${newProxyUrl}`
            //If I add 'ignoreHTTPSErrors: true' here then I can bypass the CERT errors but then it seems like I can't navigate the browser anymore to a different page.                     
          ]
        });

        const page = await browser.newPage();

        await preparePageForTests(page);

        await page.setViewport({ width: 1440, height: 1080 });

        await page.goto('https://www.google.com/search?q=concerts+near+new+york');   
        
        await page.screenshot({ path: `screenshot.jpeg` });

  } catch(err) {
    console.log(err)
  }
}

Not sure how to solve this. I believe the error here is with bypassing CERT errors with ignoreHttpsErrors. When I don't use a proxy at all, my analysis function (which essentially takes in the first 'ul' list seen below) works fine but if I use the proxy, it for some reason is giving me the data on the second page.

Any help would be much appreciated!

The 'ul' is nicely formatted and the data is easy to get at: https://i.stack.imgur.com/RwiHM.jpg

only a few 'ul' elements are visible and then I get a bunch of stuff I don't want returned. I tried doing a

page.$eval(".BXE0fe", element => element.click())

but that isn't redirecting the page for some reason: https://i.stack.imgur.com/3DTay.png

like image 364
nickcoding2 Avatar asked Jun 10 '21 00:06

nickcoding2


Video Answer


1 Answers

Aside from the point Yevgeniy made about targeting Google (he's right btw, for Google you need to use their SERP product), if you're requesting through HTTPS you need to have their CA certificate installed and send requests through the Proxy Manager instead of through the Superproxy directly, and none of this would even matter for headless Chromium because of this.

like image 105
Lou Avatar answered Sep 24 '22 20:09

Lou