Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to automate downloading a file from a site?

I want to download several data files from this URL: https://pselookup.vrymel.com/

The site contains a date field and a download button. I want to download data for multiple years (which would mean a lot of requests) and I want to make it automatically.

I've created a Javascript snippet, however, it keeps downloading just the same file over and over again.

$dateField = document.getElementsByClassName('csv_download_input__Input-encwx-1 dDiqPH')[2]

$dlButton = document.getElementsByClassName('csv_download_input__Button-encwx-0 KLfyv')[2]

var now = new Date();
var daysOfYear = [];
for (var d = new Date(2016, 0, 1); d <= now; d.setDate(d.getDate() + 1)) {
    daysOfYear.push(new Date(d).toISOString().substring(0,10));
}

(function theLoop (i) {
  setTimeout(function () {
    $dlButton.click()
    $dateField.value = daysOfYear[i]
    if (--i) {          // If i > 0, keep going
      theLoop(i);       // Call the loop again, and pass it the current value of i
    }
  }, 3000);
})(daysOfYear.length-1);

How could I download all of the files automatically?

like image 811
bloodfire1004 Avatar asked Jun 14 '19 04:06

bloodfire1004


People also ask

Can you use Power Automate to download file from website?

There are no built in actions that will do that, but you might want to take a look at Power Automate UI Flows. Using a UI Flow you can easily program a browser to login and download the file you want.

How do I automatically download a file?

Allow or Block Automatic File Downloads For All Apps Click Start > Settings > Privacy. Scroll down on the left and click on Automatic file downloads. Click on Allow.


1 Answers

First off, javascript in the client is probably not the best language to do this nor the best approach to make this happen. It might work, but it's better to know what is best when choosing an approach to a problem. Also, it will avoid for you clicking ~800 times in the popup accepting the download.

You can get the files in a programatically way by just learning what you browser is doing to get the file and trying to reproduce it in bunch.

After inspecting the calls you can see that it's calling an endpoint and that endpoint is returning a link which contains the file that you can download.

Well, that is going to be easy, so now you just need to make the script in any language to be able to retrieve them.

I've chosen javascript but not client side, but nodejs which means that this has to run from your computer.

You could do the same with bash, python or any other language.

To run this do the following:

  • Go to a new empty directory
  • Run npm install axios
  • Create a file with the code I pasted let's call it crawler.js
  • Run node crawler.js

This has been tested using node v8.15.0

// NOTE: Require this to make a request and save the link as file 20190813:Alevale
const axios = require('axios');
const fs = require('fs');

let now = new Date();
let daysOfYear = [];
const baseUrl = 'https://a4dzytphl9.execute-api.ap-southeast-1.amazonaws.com/prod/eod/'

for (var d = new Date(2016, 0, 1); d <= now; d.setDate(d.getDate() + 1)) {
    daysOfYear.push(new Date(d).toISOString().substring(0,10));
}

const waitFor = (time) => {
    return new Promise((resolve => setTimeout(resolve, time)))
}

const getUrls = async () =>{
    let day
    for (day of daysOfYear) {
        console.log('getting day', baseUrl + day)
        // NOTE: Throttle the calls to not overload the server 20190813:Alevale
        await waitFor(4000)

        await axios.get(baseUrl + day)
            .then(response => {
                console.log(response.data);
                console.log(response);
                if (response.data && response.data.download_url) {
                    return response.data.download_url
                }
                return Promise.reject('Could not retrieve response.data.download_url')
            })
            .then((url) =>{
                axios({
                    method: 'get',
                    url,
                    responseType: 'stream'
                })
                    .then(function (response) {
                        // NOTE: Save the file as 2019-08-13 20190813:Alevale
                        response.data.pipe(fs.createWriteStream(`${day}.csv`))
                    })
                    .catch(console.error)

            })
            .catch(error => {
                console.log(error);
            });
    }
}

getUrls()
like image 169
Alejandro Vales Avatar answered Oct 25 '22 03:10

Alejandro Vales