Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js and Request - limiting the file size of downloaded file

I want to download a file with the Request library. It's pretty straightforward:

request({
    url: url-to-file
}).pipe(fs.createWriteStream(file));

Since the URL is supplied by users (in my case) I would like to limit the maximum file size my application will download - let's say 10MB. I could rely on content-length headers like so:

request({
    url: url-to-file
}, function (err, res, body) {
    var size = parseInt(res.headers['content-length'], 10);

    if (size > 10485760) {
        // ooops - file size too large
    }
}).pipe(fs.createWriteStream(file));

The question is - how reliable is this? I guess this callback will be called after the file has been downloaded, right? But than it's too late if someone supplies the URL of file which is 1 GB. My application will first download this 1 GB of a file just to check (in the callback) that this is too big.

I was also thinking about good old Node's http.get() method. In this case I would do this:

var opts = {
    host: host,
    port: port,
    path: path
};

var file = fs.createWriteStream(fileName),
    fileLength = 0;

http.get(opts, function (res) {
    res.on('data', function (chunk) {
        fileLength += chunk.length;

        if (fileLength > 10485760) { // ooops - file size too large
            file.end();
            return res.end();
        }

        file.write(chunk);
    }).on('end', function () {
        file.end();
    });
});

What approach would you recommend to limit download max file size without actually downloading the whole thing and checking it's size after all?

like image 944
Pono Avatar asked Aug 01 '14 12:08

Pono


People also ask

How do I limit the payload size in Express js?

Limit Payload Size Using a Reverse-Proxy or a Middleware });app. listen(3000, () => console. log('server started')); We just set the limit value to '200kb' to limit the request payload size to 200KB.

How many HTTP requests can node js handle?

js can handle ~15K requests per second, and the vanilla HTTP module can handle 70K rps.


2 Answers

I would actually use both methods you've discussed: check the content-legnth header, and watch the data stream to make sure it doesn't exceed your limit.

To do this I'd first make a HEAD request to the URL to see if the content-length header is available. If it's larger than your limit, you can stop right there. If it doesn't exist or it's smaller than your limit, make the actual GET request. Since a HEAD request will only return the headers and no actual content, this will help weed out large files with valid content-lengths quickly.

Next, make the actual GET request and watch your incoming data size to make sure that it doesn't exceed your limit (this can be done with the request module; see below). You'll want to do this regardless of if the HEAD request found a content-length header, as a sanity check (the server could be lying about the content-length).

Something like this:

var maxSize = 10485760;

request({
    url: url,
    method: "HEAD"
}, function(err, headRes) {
    var size = headRes.headers['content-length'];
    if (size > maxSize) {
        console.log('Resource size exceeds limit (' + size + ')');
    } else {
        var file = fs.createWriteStream(filename),
            size = 0;

        var res = request({ url: url });

        res.on('data', function(data) {
            size += data.length;

            if (size > maxSize) {
                console.log('Resource stream exceeded limit (' + size + ')');

                res.abort(); // Abort the response (close and cleanup the stream)
                fs.unlink(filename); // Delete the file we were downloading the data to
            }
        }).pipe(file);
    }
});

The trick to watching the incoming data size using the request module is to bind to the data event on the response (like you were thinking about doing using the http module) before you start piping it to your file stream. If the data size exceeds your maximum file size, call the response's abort() method.

like image 155
Mike S Avatar answered Oct 22 '22 05:10

Mike S


I had a similar issue. I use now fetch to limit download size.

const response = await fetch(url, {
    method: 'GET',t
    size: 5000000, // maximum response body size in bytes, 5000000 = 5MB 
}).catch(e => { throw e })
like image 44
NFpeter Avatar answered Oct 22 '22 05:10

NFpeter