Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js and heap out of memory on Windows

I have some issue with a project of mine, which aims to scan one or more directories in search of MP3 files and store its metadata and paths into MongoDB. The main computer which runs the code is a Windows 10 64-bit machine, with 8GB RAM, CPU AMD Ryzen 3.5 GHz (4 cores). Windows resides on an SSD, while the music on HDD 1 TB.
The nodejs app can be launched manually by command line or through NPM, starting from here. I'm using a recursive function to scan all the directories and we're talking about 20 thousand files more or less.
I've solved the EMFILE: too many files open issue through graceful-fs but now I've landed to a new issue: JavaScript heap out of memory.
Below is the complete output which I receive:

C:\Users\User\Documents\GitHub\mp3manager>npm run scan

> [email protected] scan C:\Users\User\Documents\GitHub\mp3manager
> cross-env NODE_ENV=production NODE_OPTIONS='--max-old-space-size=4096' node scripts/cli/mm scan D:\Musica

Scanning 1 resources in production mode
Trying to connect to  mongodb://localhost:27017/music_manager
Connected to mongo...

<--- Last few GCs --->

[16744:0000024DD9FA9F40]   141399 ms: Mark-sweep 63.2 (70.7) -> 63.2 (71.2) MB, 47.8 / 0.1 ms  (average mu = 0.165, current mu = 0.225) low memory notification GC in old space requested
[16744:0000024DD9FA9F40]   141438 ms: Mark-sweep 63.2 (71.2) -> 63.2 (71.2) MB, 38.9 / 0.1 ms  (average mu = 0.100, current mu = 0.001) low memory notification GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x02aaa229e6e9 <JSObject>
    0: builtin exit frame: new ArrayBuffer(aka ArrayBuffer)(this=0x027bb3502801 <the_hole>,0x0202be202569 <Number 8.19095e+06>,0x027bb3502801 <the_hole>)

    1: ConstructFrame [pc: 000002AF8F50D385]
    2: createUnsafeArrayBuffer(aka createUnsafeArrayBuffer) [00000080419526C9] [buffer.js:~115] [pc=000002AF8F8440B1](this=0x027bb35026f1 <undefined>,size=0x0202be202569 <Number 8.19095e+06>)
    3:...

FATAL ERROR: Committing semi space failed. Allocation failed - JavaScript heap out of memory
 1: 00007FF6E36FF04A
 2: 00007FF6E36DA0C6
 3: 00007FF6E36DAA30
 4: 00007FF6E39620EE
 5: 00007FF6E396201F
 6: 00007FF6E3E82BC4
 7: 00007FF6E3E79C5C
 8: 00007FF6E3E7829C
 9: 00007FF6E3E77765
10: 00007FF6E3989A91
11: 00007FF6E35F0E52
12: 00007FF6E3C7500F
13: 00007FF6E3BE55B4
14: 00007FF6E3BE5A5B
15: 00007FF6E3BE587B
16: 000002AF8F55C721
npm ERR! code ELIFECYCLE
npm ERR! errno 134

I've tried to use NODE_OPTIONS='--max-old-space-size=4096' but I'm not even sure that Node is considering this option on Windows. I've tried p-limit to limit the number of promises effectively running, but honestly, I'm a bit out of new ideas now and I'm starting thinking to use another language to see if it can cope better with these kinds of issues. Any advice would be appreciated. Have a nice day.

EDIT: I tried to substitute the processDir function with the one posted by @Terry but the result it's the same.

Update 2019-08-19: In order to avoid the heap issues, I removed the recursion and used a queue to add the directories:


const path = require('path');
const mm = require('music-metadata');
const _ = require('underscore');
const fs = require('graceful-fs');
const readline = require('readline');

const audioType = require('audio-type');
// const util = require('util');
const { promisify } = require('util');
const logger = require('../logger');
const { mp3hash } = require('../../../src/libs/utils');
const MusicFile = require('../../../src/models/db/mongo/music_files');

const getStats = promisify(fs.stat);
const readdir = promisify(fs.readdir);
const readFile = promisify(fs.readFile);
// https://github.com/winstonjs/winston#profiling

class MusicScanner {
    constructor(options) {
        const { paths, keepInMemory } = options;

        this.paths = paths;
        this.keepInMemory = keepInMemory === true;
        this.processResult = {
            totFiles: 0,
            totBytes: 0,
            dirQueue: [],
        };
    }

    async processFile(resource) {
        const buf = await readFile(resource);
        const fileRes = audioType(buf);          
        if (fileRes === 'mp3') {
            this.processResult.totFiles += 1;

            // process the metadata
            this.processResult.totBytes += fileSize;
        }
    }

    async processDirectory() {
        while(this.processResult.dirQueue.length > 0) {
            const dir = this.processResult.dirQueue.shift();
            const dirents = await readdir(dir, { withFileTypes: true });
            const filesPromises = [];

            for (const dirent of dirents) {
                const resource = path.resolve(dir, dirent.name);
                if (dirent.isDirectory()) {
                    this.processResult.dirQueue.push(resource);
                } else if (dirent.isFile()) {
                    filesPromises.push(this.processFile(resource));
                }
            }

            await Promise.all(filesPromises);
        }
    }


    async scan() {
        const promises = [];

        const start = Date.now();

        for (const thePath of this.paths) {
            this.processResult.dirQueue.push(thePath);
            promises.push(this.processDirectory());
        }

        const paths = await Promise.all(promises);
        this.processResult.paths = paths;
        return this.processResult;
    }
}

module.exports = MusicScanner;

The problem here is that the process takes 54 minutes to read 21K files and I'm not sure how I could speed up the process in this case. Any hints on that?

like image 661
Chris Avatar asked Sep 18 '25 21:09

Chris


1 Answers

I'm not sure how helpful this will be, but I created a test script to see if I got the same results as you, I'm also running Windows 10.

It might be useful for you to run this script and see if you get any issues. I am able to list all files in /program files/ (~91k files) or even /windows (~265k files) without blowing up. Maybe it's another operation rather than simply listing the files that's causing the problem.

The script will return a list of all the files in the path, so that's pretty much what you need. Once you have this it can simply be iterated in a linear manner and then you can add the details to your Mongo DB instance.

const fs = require('fs');
const path = require('path');
const { promisify } = require('util');
const getStats = promisify(fs.stat);
const readdir = promisify(fs.readdir);

async function scanDir(dir, fileList) {

    let files = await readdir(dir);
    for(let file of files) {
        let filePath = path.join(dir, file);
        fileList.push(filePath);
        try {
            let stats = await getStats(filePath);
            if (stats.isDirectory()) {
                await scanDir(filePath, fileList);
            }
        } catch (err) {
            // Drop on the floor.. 
        }
    }

    return fileList;   
}

function logStats(fileList) {
    console.log("Scanned file count: ", fileList.length);
    console.log(`Heap total: ${parseInt(process.memoryUsage().heapTotal/1024)} KB, used: ${parseInt(process.memoryUsage().heapUsed/1024)} KB`);
}

async function testScan() {
    let fileList = [];
    let handle = setInterval(logStats, 5000, fileList);
    let startTime = new Date().getTime();
    await scanDir('/program files/', fileList);
    clearInterval(handle);
    console.log(`File count: ${fileList.length}, elapsed: ${(new Date().getTime() - startTime)/1000} seconds`);
}

testScan();
like image 87
Terry Lennox Avatar answered Sep 21 '25 15:09

Terry Lennox