Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Selenium AWS Lambda Change WebGL Vendor/Renderer For Undetectable Headless Scraper

Concept:

Using AWS Lambda functions with Python and Selenium, I want to create a undetectable headless chrome scraper by passing a headless chrome test. I check the undetectability of my headless scraper by opening up the test and taking a screenshot. I ran this test on a Local IDE and on a Lambda server.


Implementation:

I will be using a python library called selenium-stealth and will follow their basic configuration:

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

I implemented this configuration on a Local IDE as well as an AWS Lambda Server to compare the results.


Local IDE:

Found below are the test results running on a local IDE: enter image description here


Lambda Server:

When I run this on a Lambda server, both the WebGL Vendor and Renderer are blank. as shown below:

enter image description here

I even tried to manually change the WebGL Vendor/Renderer using the following JavaScript command:

driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {"source": "WebGLRenderingContext.prototype.getParameter = function(parameter) {if (parameter === 37445) {return 'VENDOR_INPUT';}if (parameter === 37446) {return 'RENDERER_INPUT';}return getParameter(parameter);};"})

Then I thought maybe that it could be something wrong with the parameter number. I configured the command execution without the if statement, but the same thing happened: It worked on my Local IDE but had no effect on an AWS Lambda Server.

Simply Put:

Is it possible to add Vendor/Renderer on AWS Lambda? In my efforts, it seems that there is no possible way. I made sure to submit this issue on the selenium-stealth GitHub Repository.

like image 557
Luke Hamilton Avatar asked Dec 07 '21 18:12

Luke Hamilton


1 Answers

WebGL

WebGL is a cross-platform, open web standard for a low-level 3D graphics API based on OpenGL ES, exposed to ECMAScript via the HTML5 Canvas element. WebGL at it's core is a Shader-based API using GLSL, with constructs that are semantically similar to those of the underlying OpenGL ES API. It follows the OpenGL ES specification, with some exceptions for the out of memory-managed languages such as JavaScript. WebGL 1.0 exposes the OpenGL ES 2.0 feature set; WebGL 2.0 exposes the OpenGL ES 3.0 API.

Now, with the availability of Selenium Stealth building of Undetectable Scraper using Selenium driven ChromeDriver initiated google-chrome Browsing Context have become much more easier.


selenium-stealth

selenium-stealth is a python package selenium-stealth to prevent detection. This programme tries to make python selenium more stealthy. However, as of now selenium-stealth only support Selenium Chrome.

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from selenium_stealth import stealth
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    
    # Selenium Stealth settings
    stealth(driver,
          languages=["en-US", "en"],
          vendor="Google Inc.",
          platform="Win32",
          webgl_vendor="Intel Inc.",
          renderer="Intel Iris OpenGL Engine",
          fix_hairline=True,
      )
    
    driver.get("https://bot.sannysoft.com/")
    
  • Browser Screenshot:

bot_sannysoft

You can find a detailed relevant discussion in Can a website detect when you are using Selenium with chromedriver?


Changing WebGL Vendor/Renderer in AWS Lambda

AWS Lambda enables us to deliver compressed WebGL websites to end users. When requested webpage objects are compressed, the transfer size is reduced, leading to faster downloads, lower cloud storage fees, and lower data transfer fees. Improved load times also directly influence the viewer experience and retention, which helps in improving website conversion and discoverability. Using WebGL, websites are more immersive while still being accessible via a browser URL. Through this technique AWS Lambda to automatically compress the objects uploaded to S3.

product-page-diagram_Lambda-RealTimeFileProcessing.a59577de4b6471674a540b878b0b684e0249a18c

Background on compression and WebGL

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization. This capability is negotiated between the server and the client using an HTTP header which may indicate that a resource being transferred, cached, or otherwise referenced is compressed. AWS Lambda on the server-side supports Content-Encoding header.

On the client-side, most browsers today support brotli and gzip compression through HTTP headers (Accept-Encoding: deflate, br, gzip) and can handle server response headers. This means browsers will automatically download and decompress content from a web server at the client-side, before rendering webpages to the viewer.


Conclusion

Due to this constraint you may not be able to change the WebGL Vendor/Renderer in AWS Lambda, else it may directly affect the process of rendering webpages to the viewers and can stand out to be a bottleneck in UX.


tl; dr

You can find a couple of relevant detailed discussion in:

  • Can a website detect when you are using Selenium with chromedriver?
like image 173
undetected Selenium Avatar answered Oct 01 '22 05:10

undetected Selenium