Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download a csv file using PhantomJS

When I'm browsing a website A using normal browser (Chrome) and when I click on a link on the website A, Chrome imediatelly downloads report in a form of CSV file.

When I checked a server response headers I get the following results:

Cache-Control:private,max-age=31536000
Connection:Keep-Alive
Content-Disposition:attachment; filename="report.csv"
Content-Encoding:gzip
Content-Language:de-DE
Content-Type:text/csv; charset=UTF-8
Date:Wed, 22 Jul 2015 12:44:30 GMT
Expires:Thu, 21 Jul 2016 12:44:30 GMT
Keep-Alive:timeout=15, max=75
Pragma:cache
Server:Apache
Transfer-Encoding:chunked
Vary:Accept-Encoding

Now, I want to download and parse this file using PhantomJS. I set page onResourceReceived listener to see if Phantom will receive/download the file.

clientRequests.phantomPage.onResourceReceived = function(response) {
    console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};

When I make Phantom request to download a file (this is page.open('URL OF THE FILE')), I can see in Phantom log that file is downloaded. Here are logs:

"contentType": "text/csv; charset=UTF-8",
    "headers": {
        "name": "Date",
        "value": "Wed, 22 Jul 2015 12:57:41 GMT"
    },
    "name": "Content-Disposition",
    "value": "attachment; filename=\"report.csv\"",
    "status":200,"statusText":"OK"

I received the file and its content, but how to access file data? When I print current PhantomJS page object, I get the HTML of the page A and I don't want that, I want CSV file, which I need to parse using JavaScript.

like image 725
MrD Avatar asked Jul 22 '15 13:07

MrD


1 Answers

I found a solution for PhantomJS. Reading through this discussion I found a jsfiddle which downloads a url via jQuery's ajax method and encodes the file as base64.

The file I wanted to download was plain text (CSV) so I have removed the encoding functions. My target page also already had jQuery included so I didn't need to inject jQuery into the target page.

My code assumes you have already opened the page you want to download the file from using PhantomJS, and that page has jQuery in it. In my case I had to first login to the site in order to get the download link.

var fs = require('fs');

var page=this;

var result = page.evaluate(function() {

    var out;
    $.ajax({
        'async' : false,
        'url' : 'fullurltodownload.csv',
        'success' : function(data, status, xhr) {
            out = data;
        }
    });
    return out;

});

fs.write('mydownloadedfile.csv', result);
like image 180
Matthew Lock Avatar answered Oct 13 '22 22:10

Matthew Lock