I am trying to generate images of web pages in under a second in a server-side environment. The requests could come in parallel, at the same time from the web. To that end, I am using Puppeteer-Sharp library which works pretty well. On the back end its using Chromium to load the page and then screenshot it.
The problem is that it takes a while to get started. For instance, note the timings (from my pc) from the readme.md sample code:
var options = new new LaunchOptions {Headless = true, ExecutablePath = @"c:\foo\chrome.exe"};
var browser = await Puppeteer.LaunchAsync(options).Result; // ~500ms
var page = browser.NewPageAsync().Result; // ~215ms
var webPage = page.GoToAsync("http://www.google.com").Result; // ~500ms
var screenshot = page.ScreenshotAsync(outputFile);
screenshot.wait(); // ~300ms
As you can see, it easily goes over a second. I don't know how Chromium works internally, so I have a couple of questions pertaining to solutions that I am thinking of.
PuppeteerSharp.Browser
object thread-safe and/or re-entrant? Can I use the same browser object from different threads? I am thinking not, because it's tied to a specific instance of Chromium in memory. .LaunchAsync
and .NetPageAsync
from every request that will significantly speed up the operation. Will pool of PuppeteerSharp.Browser
objects work? For instance, I can pre-allocate 5 of these and execute .NetPageAsync
on them. Then the incoming requests would use the objects from the pool. Is that a viable approach?Although there are still many improvements going on, Puppeteer-Sharp is thread-safe. To improve loading performance, there are a few approaches you can take.
Launch one browser and then connect to it
You can launch one (real) browser and then use the ConnectAsync
method to connect to it.
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = false,
});
var theBrowser1 = await Puppeteer.ConnectAsync(new ConnectOptions { BrowserWSEndpoint = browser.WebSocketEndpoint });
var theBrowser2 = await Puppeteer.ConnectAsync(new ConnectOptions { BrowserWSEndpoint = browser.WebSocketEndpoint });
var page1 = await theBrowser1.NewPageAsync();
var page2 = await theBrowser2.NewPageAsync();
await Task.WhenAll(
page1.GoToAsync("https://www.stackoverflow.com"),
page2.GoToAsync("https://serverfault.com/")
);
I know that code is not running in parallel, but you'll get the idea about reusing the same browser.
Create new pages on the same browser
If you are using TPL, you shouldn't have any issues creating new pages from different threads using the same browser.
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = false,
});
var urls = new string[]
{
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com",
"https://www.stackoverflow.com"
};
await Task.WhenAll(
urls.Select(url => Task.Factory.StartNew(async () =>
{
var page = await browser.NewPageAsync();
return page.GoToAsync(url);
})));
Again, this example is just to give you an idea of how this could be accomplished.
Pages queue
There is one user who created a queue of X pages (for x from 0 to X => NewPage) and then he would grab pages from that queue. You can see the example here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With