Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-threading for zip in nodejs

Can zip and unzip operation be made-multithreaded in nodejs ?

There are a bunch of modules like yauzl, but neither uses multiple threads, and you can't start multiple threads yourself with node-cluster or something like that, because each zip file must be handled in a single thread

like image 539
Alex Avatar asked Nov 15 '19 07:11

Alex


People also ask

Are zips multithreaded?

For example, the hard drive can only load one file at a time, and can only write one zipped file to the archive at a time, so this aspect probably cannot be multithreaded. Nevertheless, it is possible to be loading one file into memory at the same time as compressing another file in memory.

Can we do multi threading in Nodejs?

js core and allow us to create and sync threads. But that isn't possible. If we add threads to JavaScript, then we are changing the nature of the language. We cannot just add threads as a new set of classes or functions available — we'd probably need to change the language to support multithreading.

Is Nodejs single-threaded or multi-threaded?

js is single-threaded because the JavaScript programming language is single-threaded.

Why is node js not multithreaded?

js follows Single-Threaded with Event Loop Model inspired by JavaScript Event-based model with JavaScript callback mechanism. So, node. js is single-threaded similar to JavaScript but not purely JavaScript code which implies things that are done asynchronously like network calls, file system tasks, DNS lookup, etc.


4 Answers

According to Zlib documentation

Threadpool Usage: All zlib APIs, except those that are explicitly synchronous, use libuv's threadpool. This can lead to surprising effects in some applications, such as subpar performance (which can be mitigated by adjusting the pool size) and/or unrecoverable and catastrophic memory fragmentation. https://nodejs.org/api/zlib.html#zlib_threadpool_usage

According to libuv's threadpool you can change the environment variable UV_THREADPOOL_SIZE to change the maximum size

If you instead wish to be compressing many small files at the same time you can use Worker Threads https://nodejs.org/api/worker_threads.html

On reading your question again it seems like you want multiple files. Use Worker Threads, these will not block your main thread and you can get the output back from them via promises.

like image 103
Strike Eagle Avatar answered Oct 21 '22 10:10

Strike Eagle


Node JS uses Libuv and worker thread . Worker thread is a way to do operation in multi-threaded manner. While by using libuv (it maintains thread in thread pool) you can increase thread of default node js server. You can use both to improve node js performance for your operation.

So here is official documentation for worker thread : https://nodejs.org/api/worker_threads.html

See how you can increase thread pool in node js here : print libuv threadpool size in node js 8

like image 23
Slim Coder Avatar answered Oct 21 '22 10:10

Slim Coder


Help for how to do multi-threading in node js. You will have to create below three file

index.mjs

import run from './Worker.mjs';

/**
* design your input list of zip files here and send them to `run` one file name at a time
* to zip, using a loop or something. It acts as promise.
* exmaple : run( <your_input> ).then( <your_output> );
**/

Worker.mjs

import { Worker } from 'worker_threads';

function runService(id, options) {
    return new Promise((resolve, reject) => {
        const worker = new Worker('./src/WorkerService.mjs', { workerData: { <your_input> } });
        worker.on('message', res => resolve({ res: res, threadId: worker.threadId }));
        worker.on('error', reject);
        worker.on('exit', code => {
            if (code !== 0)
                reject(new Error(`Worker stopped with exit code ${code}`));
        });
    });
}

async function run(id, options) {
    return await runService(id, options);
}

export default run;

WorkerService.mjs

import { workerData } from 'worker_threads';

// Here goes your logic for zipping a file, where as `workerData` will have <your_input>.

Let me know if it helps.

like image 30
Akshay Avatar answered Oct 21 '22 09:10

Akshay


Can zip and unzip operation be made-multithreaded in nodejs?

Yes.

...and you can't start multiple threads yourself ... because each zip file must be handled in a single thread

I suspect your premise is faulty. Why exactly do you think a node process cannot start multiple threads? Here is an app I'm running which is using the very mature node.js cluster module with a parent process acting as a supervisor and two child processes doing heavily network and disk I/O bound tasks.

top output showing node.js processes using CPU threads

As you can see in the C column, each process is running on a separate thread. This lets the master process remain responsive for command and control tasks (like spawning/reaping workers) while the worker processes are CPU or disk bound. This particular server accepts files from the network, sometimes decompresses them, and feeds them through external file processors. IOW, its a task that includes compression like you describe.

I'm not sure you'd want to use worker threads based on this snippet from the docs:

Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.

To me, that description screams, "crypo!" In the past I've spawned child processes when having to perform any expensive crypo operations.

In another project I use node's child_process module and kick off a new child process each time I have a batch of files to compress. That particular service sees a list of ~400 files with names like process-me-2019.11.DD.MM and concatenates them into a single process-me-2019-11-DD file. It takes a while to compress so spawning a new process avoids blocking on the main thread.

like image 28
Matt Simerson Avatar answered Oct 21 '22 08:10

Matt Simerson