I want to download several data files from this URL: https://pselookup.vrymel.com/
The site contains a date field and a download button. I want to download data for multiple years (which would mean a lot of requests) and I want to make it automatically.
I've created a Javascript snippet, however, it keeps downloading just the same file over and over again.
$dateField = document.getElementsByClassName('csv_download_input__Input-encwx-1 dDiqPH')[2]
$dlButton = document.getElementsByClassName('csv_download_input__Button-encwx-0 KLfyv')[2]
var now = new Date();
var daysOfYear = [];
for (var d = new Date(2016, 0, 1); d <= now; d.setDate(d.getDate() + 1)) {
daysOfYear.push(new Date(d).toISOString().substring(0,10));
}
(function theLoop (i) {
setTimeout(function () {
$dlButton.click()
$dateField.value = daysOfYear[i]
if (--i) { // If i > 0, keep going
theLoop(i); // Call the loop again, and pass it the current value of i
}
}, 3000);
})(daysOfYear.length-1);
How could I download all of the files automatically?
There are no built in actions that will do that, but you might want to take a look at Power Automate UI Flows. Using a UI Flow you can easily program a browser to login and download the file you want.
Allow or Block Automatic File Downloads For All Apps Click Start > Settings > Privacy. Scroll down on the left and click on Automatic file downloads. Click on Allow.
First off, javascript in the client is probably not the best language to do this nor the best approach to make this happen. It might work, but it's better to know what is best when choosing an approach to a problem. Also, it will avoid for you clicking ~800 times in the popup accepting the download.
You can get the files in a programatically way by just learning what you browser is doing to get the file and trying to reproduce it in bunch.
After inspecting the calls you can see that it's calling an endpoint and that endpoint is returning a link which contains the file that you can download.
Well, that is going to be easy, so now you just need to make the script in any language to be able to retrieve them.
I've chosen javascript
but not client side, but nodejs
which means that this has to run from your computer.
You could do the same with bash
, python
or any other language.
To run this do the following:
npm install axios
crawler.js
node crawler.js
This has been tested using node v8.15.0
// NOTE: Require this to make a request and save the link as file 20190813:Alevale
const axios = require('axios');
const fs = require('fs');
let now = new Date();
let daysOfYear = [];
const baseUrl = 'https://a4dzytphl9.execute-api.ap-southeast-1.amazonaws.com/prod/eod/'
for (var d = new Date(2016, 0, 1); d <= now; d.setDate(d.getDate() + 1)) {
daysOfYear.push(new Date(d).toISOString().substring(0,10));
}
const waitFor = (time) => {
return new Promise((resolve => setTimeout(resolve, time)))
}
const getUrls = async () =>{
let day
for (day of daysOfYear) {
console.log('getting day', baseUrl + day)
// NOTE: Throttle the calls to not overload the server 20190813:Alevale
await waitFor(4000)
await axios.get(baseUrl + day)
.then(response => {
console.log(response.data);
console.log(response);
if (response.data && response.data.download_url) {
return response.data.download_url
}
return Promise.reject('Could not retrieve response.data.download_url')
})
.then((url) =>{
axios({
method: 'get',
url,
responseType: 'stream'
})
.then(function (response) {
// NOTE: Save the file as 2019-08-13 20190813:Alevale
response.data.pipe(fs.createWriteStream(`${day}.csv`))
})
.catch(console.error)
})
.catch(error => {
console.log(error);
});
}
}
getUrls()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With