Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Headless browser image quality - Headless chrome, phantom js, slimmer js

I'm looking for more information on what takes place under the hood in headless browsers. I've been working with different headless browsers in the past such as slimmerJS, Phantom.js and Headless Chrome, with the purpose of taking screenshots in different sites.

I never got to generate a real-looking, sharp-quality image that would resemble what you see in the browser, it looks like a tool limitation, like, that is the maximum quality you can get out of this, but I want to understand why, and possibly, how to make it better.

Please compare the examples below.

  1. In this website, https://en.wikipedia.org/wiki/Main_Page, find the Wikipedia logo at the top-left corner.
  2. This is a screenshot of that logo taken by headless chrome through puppeteer:

enter image description here

If you compare the real website vs the screenshot, you can see how the image is blurred out. In this example, it's just an image, but this also happens with HTML text.

Now, If I were to take a screenshot using my computer, be it windows, mac, linux, I'd get a very good quality screenshot that completely looks like the real deal.

So why does this happen? I tried all the standard things as setting the screenshot with top quality in each library, and setting a big enough viewport so the screenshot has a decent resolution. Is this really the top quality you can get from a headless browser screenshot?

Any enlightenment on this area would be appreciated. Thanks!

like image 430
Bruno Smaldone Avatar asked Dec 27 '19 22:12

Bruno Smaldone


People also ask

What is difference between Chrome and Chrome headless?

Headless mode is a functionality that allows the execution of a full version of the latest Chrome browser while controlling it programmatically. It can be used on servers without dedicated graphics or display, meaning that it runs without its “head”, the Graphical User Interface (GUI).

Can headless Chrome run JavaScript?

As an alternative, and for more functionality, you can run headless Chrome using Puppeteer. Puppeteer is a Node. js library and, as a result, you are writing commands for the headless browser in JavaScript code which is considerably easier than writing those commands in a command line.

Is PhantomJS a headless browser?

PhantomJS is a discontinued headless browser used for automating web page interaction. PhantomJS provides a JavaScript API enabling automated navigation, screenshots, user behavior and assertions making it a common tool used to run browser-based unit tests in a headless system like a continuous integration environment.

Is headless chrome better than PhantomJS for running full browsers?

However, running full browsers is an expensive task and finding the best solution is not easy. Trends seem to favor more and more using Headless Chrome over PhantomJS when an automated browser is needed. The rumor is that the new headless mode of Chrome is both faster and less memory intensive than PhantomJS.

What is PhantomJS?

PhantomJS - Scriptable Headless Browser Important: PhantomJS development is suspendeduntil further notice (more details). PhantomJS is a headless web browser scriptable with JavaScript. It runs on Windows, macOS, Linux, and FreeBSD.

How do I Start Chrome in headless mode?

The easiest way to get started with headless mode is to open the Chrome binary from the command line. If you've got Chrome 59+ installed, start Chrome with the --headless flag: Note: Right now, you'll also want to include the --disable-gpu flag if you're running on Windows. See crbug.com/737678. chrome should point to your installation of Chrome.


1 Answers

You will get better results setting the deviceScaleFactor to 2 (a.k.a emulate retina):

(async () => {
    const browser = await puppeteer.launch({ headless: false })
    const page = await browser.newPage();
    await page.setViewport({width: 800, height: 800, deviceScaleFactor: 2});
    await page.goto('https://en.wikipedia.org/wiki/Main_Page')

    await page.screenshot({ fullPage: true, path: 'test.png' })
})();

As you can see, you can get very decent results:

enter image description here

Source: https://github.com/puppeteer/puppeteer/issues/571

like image 178
hardkoded Avatar answered Oct 20 '22 17:10

hardkoded