Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

puppeteer how to return page.on response values [duplicate]

I know this should be simple. But how do return the values for use outside the function, I cannot get it to work. This works downloading file and in the console returns

value: attachment; filename="filename"

await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './tmp'})
await page.click('download');

await page.on('response', resp => {
    var header = resp.headers();
    console.log("value: " + header['content-disposition']); 
});

but this and everything I have tried returns nothing

await page.on('response', resp => {
     var header = resp.headers();
     return header['content-disposition'];  
 });

I want to be able to return the filename, file size, etc. of a downloaded file for further use in the script.

How do I return and access the response values?

like image 850
Kevin Avatar asked Jul 22 '18 03:07

Kevin


2 Answers

You shouldn't use the await operator before page.on().

The Puppeteer page class extends Node.js's native EventEmitter, which means that whenever you call page.on(), you are setting up an event listener using Node.js's emitter.on().

This means that the functionality you include in page.on('response') will execute when the response event is fired.

You don't return values from an event handler. Instead, the functionality within the event handler is executed when the event occurs.

If you want to use the result of page.on() in a function, you can use the following method:

const example_function = value => {
  console.log(value);
};

page.on('response', resp => {
  var header = resp.headers();
  example_function(header['content-disposition']);
});
like image 59
Grant Miller Avatar answered Nov 07 '22 08:11

Grant Miller


Grant I've realised from your answer that I have have made a few beginner mistakes.

  1. Puppeteer await - I thought await page.on() would pause the script until complete. I was wrong.

  2. I had placed page.on() inside the loop causing errors, it should have been outside.

  3. The script was going to the next download page before the download started and page.on() being called.

  4. I should have saved the file inside page.on() instead of outside.

Correct me if I am wrong.

This is what I was trying to do.(abbreviated)

async function main() {

 await page.goto(page, { waitUntil: 'networkidle0' });

 for(loop through download pages){

    await page.click(download);

    await page.on('response', resp => {
         var header = resp.headers();
         return header['content-disposition'];  
     });

    save.write(header['content-disposition']);
 }
}
main();

This is what has worked.

async function main() {

 page.on('response', resp => {
        var header = resp.headers();
        var fileName = header['content-disposition'];  
        save.write(fileName); 
     });
     
 await page.goto(startPage, { waitUntil: 'networkidle0' });

 for(loop through download pages){
    
    await page.goto(downloadPage, { waitUntil: 'networkidle0' });
    
    await page.click(download);
    await page.waitFor(30000);
    //download starts
    //page.on called and saves fileName     
    //page.waitFor gives it time to complete before starting next loop  
        
   }
 }
 main();

await page.waitFor(30000);

I don't know if await is required.

And page.waitFor(30000); slows the script down, but I could not get it to work without it. There might be a better way.

like image 3
Kevin Avatar answered Nov 07 '22 06:11

Kevin