Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is copying a large blob over to a worker expensive?

Using the Fetch API I'm able to make a network request for a large asset of binary data (say more than 500 MB) and then convert the Response to either a Blob or an ArrayBuffer.

Afterwards, I can either do worker.postMessage and let the standard structured clone algorithm copy the Blob over to a Web Worker or transfer the ArrayBuffer over to the worker context (making effectively no longer available from the main thread).

At first, it would seem that it would be much preferable to fetch the data as an ArrayBuffer, since a Blob is not transferrable and thus, will need to be copied over. However, blobs are immutable and thus, it seems that the browser doesn't store it in the JS heap associated to the page, but rather in a dedicated blob storage space and thus, what's ended up being copied over to the worker context is just a reference.

I've prepared a demo to try out the difference between the two approaches: https://blobvsab.vercel.app/. I'm fetching 656 MB worth of binary data using both approaches.

Something interesting I've observed in my local tests, is that copying the Blob is even faster than transferring the ArrayBuffer:

Blob copy time from main thread to worker: 1.828125 ms

ArrayBuffer transfer time from main thread to worker: 3.393310546875 ms

This is a strong indicator that dealing with Blobs is actually pretty cheap. Since they're immutable, the browser seems to be smart enough to treat them as a reference rather than linking the overlying binary data to those references.

Here are the heap memory snapshots I've taken when fetching as a Blob:

screenshot of the heap memory inspector when fetching binary data as a blob

The first two snapshots were taken after the resulting Blob of fetching was copied over the worker context using postMessage. Notice that neither of those heaps include the 656 MBs.

The latter two snapshots were taken after I've used a FileReader to actually access the underlying data, and as expected, the heap grew a lot.

Now, this is what happens with fetching directly as an ArrayBuffer:

screenshot of the heap memory inspector when fetching binary data as an arraybuffer

Here, since the binary data was simply transferred over the worker thread, the heap of the main thread is small but the worker heap contains the entirety of the 656 MBs, even before reading this data.

Now, looking around at SO I see that What is the difference between an ArrayBuffer and a Blob? mentions a lot of underlying differences between the two structures, but I haven't found a good reference regarding if one should be worried about copying over a Blob between execution contexts vs. what would seem an inherent advantage of ArrayBuffer that they're transferrable. However, my experiments show that copying the Blob might actually be faster and thus I think preferable.

It seems to be up to each browser vendor how they're storing and handling Blobs. I've found this Chromium documentation describing that all Blobs are transferred from each renderer process (i.e. a page on a tab) to the browser process and that way Chrome can even offload the Blob to the secondary memory if needed.

Does anyone have some more insights regarding all of this? If I can choose to fetch some large binary data over the network and move that to a Web Worker should I prefer a Blob or a ArrayBuffer?

like image 353
bitsoverflow Avatar asked Aug 28 '20 22:08

bitsoverflow


People also ask

What are Web Workers used for?

Web Workers are a simple means for web content to run scripts in background threads. The worker thread can perform tasks without interfering with the user interface.

How many web workers Workers can run concurrently?

A web worker is a JavaScript program running on a different thread, in parallel with main thread. The browser creates one thread per tab. The main thread can spawn an unlimited number of web workers, until the user's system resources are fully consumed.

How do I know if Webworker is running?

Show activity on this post. You should have the web worker post messages about events, like when it is finished work, this way the parent can listen to these messages/events and know when work has completed. The web worker can even post progress events, this is all up to you to build though, it does not come included.

What is web worker and service worker?

A service worker, also called a web worker, is JavaScript code that runs in the background of your Web Application regardless of an app running. It runs in a different thread than the window(main) thread, called a Workers thread.


Video Answer


1 Answers

No, it's not expensive at all to postMessage a Blob.

The cloning steps of a Blob are

Their serialization steps, given value and serialized, are:

  1. Set serialized.[[SnapshotState]] to value’s snapshot state.

  2. Set serialized.[[ByteSequence]] to value’s underlying byte sequence.

Their deserialization step, given serialized and value, are:

  1. Set value’s snapshot state to serialized.[[SnapshotState]].

  2. Set value’s underlying byte sequence to serialized.[[ByteSequence]].

In other words, nothing is copied, both the snapshot state and the byte sequence are passed by reference, (even though the wrapping JS object is not).

However regarding your full project, I wouldn't advise using Blobs here for two reasons:

  1. The fetch algorithm first fetches as an ArrayBuffer internally. Requesting a Blob adds an extra step there (which consumes memory).
  2. You'll probably need to read that Blob from the Worker, adding yet an other step (which will also consume memory since here the data will actually get copied).
like image 77
Kaiido Avatar answered Sep 19 '22 08:09

Kaiido