I'm trying to pull data from the following sample web page using Google Apps Script:
url = http://www.premierleague.com/players/2064/Wayne-Rooney/stats?se=54
using, UrlFetchApp.Fetch(url)
The problem is when I use UrlFetchApp.Fetch(url) to do that, I don't get the page information defined by the 'se' parameter in the url. Instead, I get the information on the following URL because it looks like the 'se=54' page is asynchronously loaded: http://www.premierleague.com/players/2064/Wayne-Rooney/stats
Is there any way to pass the parameter 'se' some other way? I was looking at the function and it allows the specification of 'options', as they are referred to, but the documentation on the topic is very limited.
Any help would be most appreciated. Many thanks
Tommy
It also imposes limitations on its own API, only allowing a maximum of 10,000 requests per day. From Google's perspective, web scraping is a ToS violation and a bad move overall. Still, Google isn't known to sue for scraping its content.
Install the extension and open the Web Scraper tab in developer tools (which has to be placed at the bottom of the screen); 2. Create a new sitemap; 3. Add data extraction selectors to the sitemap; 4. Lastly, launch the scraper and export scraped data.
Go to that website in your browser and open the developer tools (F12 or ctr-shift-i). Click on the network tab and reload the page with F5. A list of requests will appear. At the bottom of the list you should see the asynchronous requests made to fetch the information. Those requests get the data in json form from footballapi.pulselive.com. You can do the same thing in apps script. But you have to send a correct "origin" header line or your request gets rejected. Here is an example.
function fetchData() {
var url = "http://footballapi.pulselive.com/football/stats/player/2064?comps=1";
var options = {
"headers": {
"Origin": "http://www.premierleague.com"
}
}
var json = JSON.parse(UrlFetchApp.fetch(url, options).getContentText());
for(var i = 0; i < json.stats.length; i++) {
if(json.stats[i].name === "goals") Logger.log(json.stats[i]);
}
}
Please try the following solution:
var options =
{
"method" : "GET",
"followRedirects" : true,
"muteHttpExceptions": true
};
var result = UrlFetchApp.fetch(url, options);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With