How to download file with puppeteer using headless: true?

Tags:

I've been running the following code in order to download a csv file from the website http://niftyindices.com/resources/holiday-calendar:

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();

await page.goto('http://niftyindices.com/resources/holiday-calendar');
await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', 
downloadPath: '/tmp'})
await page.click('#exportholidaycalender');
await page.waitFor(5000);
await browser.close();
})();

with headless: false it works, it downloads the file into /Users/user/Downloads. with headless: true it does NOT work.

I'm running this on a macOS Sierra (MacBook Pro) using puppeteer version 1.1.1 which pulls Chromium version 66.0.3347.0 into .local-chromium/ directory and used npm init and npm i --save puppeteer to set it up.

Any idea whats wrong?

Thanks in advance for your time and help,

430

asked Mar 12 '18 22:03

Antonio Gomez Alvarado

3 Answers

I spent hours poring through this thread and Stack Overflow yesterday, trying to figure out how to get Puppeteer to download a csv file by clicking a download link in headless mode in an authenticated session. The accepted answer here didn't work in my case because the download does not trigger targetcreated, and the next answer, for whatever reason, did not retain the authenticated session. This article saved the day. In short, fetch. Hopefully this helps someone else out.

const res = await this.page.evaluate(() =>
{
    return fetch('https://example.com/path/to/file.csv', {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});

answered Oct 12 '22 00:10

Justin

This page downloads a csv by creating a comma delimited string and forcing the browser to download it by setting the data type like so

let uri = "data:text/csv;charset=utf-8," + encodeURIComponent(content);
window.open(uri, "Some CSV");

This on chrome opens a new tab.

You can tap into this event and physically download the contents into a file. Not sure if this is the best way but works well.

const browser = await puppeteer.launch({
  headless: true
});
browser.on('targetcreated', async (target) => {
    let s = target.url();
    //the test opens an about:blank to start - ignore this
    if (s == 'about:blank') {
        return;
    }
    //unencode the characters after removing the content type
    s = s.replace("data:text/csv;charset=utf-8,", "");
    //clean up string by unencoding the %xx
    ...
    fs.writeFile("/tmp/download.csv", s, function(err) {
        if(err) {
            console.log(err);
            return;
        }
        console.log("The file was saved!");
    }); 
});

const page = await browser.newPage();
.. open link ...
.. click on download link ..

answered Oct 12 '22 01:10

Sumit Mishra

The problem is that the browser closes before download finished.

You can get the filesize and the name of the file from the response, and then use a watch script to check file size from downloaded file, in order to close the browser.

This is an example:

    const filename = "set this with some regex in response";
    const dir = "watch folder or file";
    
    // Download and wait for download
        await Promise.all([
            page.click('#DownloadFile'),
           // Event on all responses
            page.on('response', response => {
                // If response has a file on it
                if (response._headers['content-disposition'] === `attachment;filename=${filename}`) {
                   // Get the size
                    console.log('Size del header: ', response._headers['content-length']);
                    // Watch event on download folder or file
                     fs.watchFile(dir, function (curr, prev) {
                       // If current size eq to size from response then close
                        if (parseInt(curr.size) === parseInt(response._headers['content-length'])) {
                            browser.close();
                            this.close();
                        }
                    });
                }
            })
        ]);

Even that the way of searching in response can be improved though I hope you'll find this useful.

answered Oct 12 '22 01:10

Juan Carlos Migliavacca

Related questions
                            
                                MongoDB - Error: getMore command failed: Cursor not found
                            
                                npm install module in current directory
                            
                                NestJS enable cors in production
                            
                                npm install with error: `gyp` failed with exit code: 1
                            
                                Sort sequelize.js query by date
                            
                                How do you completely remove Ionic and Cordova installation from mac?
                            
                                sh: 1: cross-env: Permission denied on laravel mix
                            
                                Compiling Webpack in memory but resolving to node_modules on disk
                            
                                Rails upgrade to angular 2
                            
                                Domain Driven Design in Node.js Application
                            
                                io.on('connection',...) vs io.sockets.on('connection',...)
                            
                                readFileSync is not a function
                            
                                What is the difference between Child_process and Worker Threads?
                            
                                Why does mongoose use schema when mongodb's benefit is supposed to be that it's schema-less?
                            
                                What are the use cases of jsdom
                            
                                Go to the TypeScript source file instead of the type definition file in VS Code
                            
                                Node.js with Express - throw Error vs next(error)
                            
                                Does Node.js enforce a minimum delay for setTimeout?
                            
                                Cannot find module 'findup-sync' when trying to run grunt
                            
                                What is the convention for versioning npm packages prior to 1.0.0?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to download file with puppeteer using headless: true?

Tags:

node.js

puppeteer

chromium