Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doubts about event loop , multi-threading, order of executing readFile() in Node.js?

fs.readFile("./large.txt", "utf8", (err, data) => {
console.log('It is a large file') 

//this file has many words (11X MB). 
//It takes 1-2 seconds to finish reading (usually 1)

});

fs.readFile("./small.txt","utf8", (err, data) => {

for(let i=0; i<99999 ;i++)
console.log('It is a small file');

//This file has just one word. 
//It always takes 0 second
}); 

Result:

The console will always first print "It is a small file" for 99999 times (it takes around 3 seconds to finish printing). Then, after they are all printed, the console does not immediately print "It is a large file". (It is always printed after 1 or 2 seconds).

My thought:

So, it seems that the first readFile() and second readFile() functions do not run in parallel. If the two readFile() functions ran in parallel, then I would expect that after "It is a small file" was printed for 99999 times, the first readFile() is finished reading way earlier (just 1 second) and the console would immediately print out the callback of the first readFile() (i.e. "It is a large file".)

My questions are :

(1a) Does this mean that the first readFile() will start to read file only after the callback of second readFile() has done its work?

(1b) To my understanding, in nodeJs, event loop passes the readFile() to Libuv multi-thread. However, I wonder in what order they are passed. If these two readFile() functions do not run in parallel, why is the second readFile() function always executed first?

(2) By default, Libuv has four threads for Node.js. So, here, do these two readFile() run in the same thread? Among these four threads, I am not sure whether there is only one for readFile().

Thank you very much for spending your time! Appreciate!

like image 815
John Chau Avatar asked Mar 02 '23 23:03

John Chau


1 Answers

I couldn't possibly believe that node would delay the large file read until the callback for the small file read had completed, and so I did a little more instrumentation of your example:

const fs = require('fs');

const readLarge = process.argv.includes('large');
const readSmall = process.argv.includes('small');

if (readLarge) {
    console.time('large');
    fs.readFile('./large.txt', 'utf8', (err, data) => {
        console.timeEnd('large');
        if (readSmall) {
            console.timeEnd('large (since busy wait)');
        }
    });
}

if (readSmall) {
    console.time('small');
    fs.readFile('./small.txt', 'utf8', (err, data) => {
        console.timeEnd('small');
        var stop = new Date().getTime();
        while(new Date().getTime() < stop + 3000) { ; } // busy wait
        console.time('large (since busy wait)');
    });
}

(Note that I replaced your loop of console.logs with a 3s busy wait).

Running this against node v8.15.0 I get the following results:

$ node t small # read small file only
small: 0.839ms
$ node t large # read large file only
large: 447.348ms
$ node t small large # read both files
small: 3.916ms
large: 3252.752ms
large (since busy wait): 247.632ms

These results seem sane; the large file took ~0.5s to read on its own, but when the busy waiting callback interfered for 2s, it completed relatively soon (~1/4s) thereafter. Tweaking the length of the busy wait keeps this relatively consistent, so I'd be willing to just say this was some kind of scheduling overhead and not necessarily a sign that the large file I/O had not been running during the busy wait.

But then I ran the same program against node 10.16.3, and here's what I got:

$ node t small
small: 1.614ms
$ node t large
large: 1019.047ms
$ node t small large
small: 3.595ms
large: 4014.904ms
large (since busy wait): 1009.469ms

Yikes! Not only did the large file read time more than double (to ~1s), it certainly appears as if no I/O at all had been completed before the busy wait concluded! i.e., it sure looks like the busy wait in the main thread prevented any I/O at all from happening on the large file.

I suspect that this change from 8.x to 10.x is a result of this "optimization" in Node 10: https://github.com/nodejs/node/pull/17054. This change, which splits the read of large files into multiple operations, seems to be appropriate for smoothing performance of the system in general purpose cases, but it is likely exacerbated by the unnatural long main thread processing / busy wait in this scenario. Presumably, without the main thread yielding, the I/O is not getting a chance to advance to the next range of bytes in the large file to be read.

It would seem that, with Node 10.x, it is important to have a responsive main thread (i.e., one that yields frequently, and doesn't busy wait like in this example) in order to sustain I/O performance of large file reads.

like image 104
Myk Willis Avatar answered Mar 05 '23 15:03

Myk Willis