Python- Downloading a file from a webpage by clicking on a link

I've looked around the internet for a solution to this but none have really seemed applicable here. I'm writing a Python program to predict the next day's stock price using historical data. I don't need all the historical data since inception as Yahoo finance provides but only the last 60 days or so. The NASDAQ website provides just the right amount of historical data and I wanted to use that website.

What I want to do is, go to a particular stock's profile on NASDAQ. For Example: (www.nasdaq.com/symbol/amd/historical) and click on the "Download this File in Excel Format" link at the very bottom. I inspected the page's HTML to see if there was an actual link I can just use with urllib to get the file but all I got was:

<a id="lnkDownLoad" href="javascript:getQuotes(true);">
                Download this file in Excel Format
            </a>

No link. So my question is,how can I write a Python script that goes to a given stock's NASDAQ page, click on the Download file in excel format link and actually download the file from it. Most solutions online require you to know the url where the file is stored but in this case, I don't have access to that. So how do I go about doing this?

How do I extract specific links from a webpage in Python?

Get all links from a webpage download webpage data (html) create beautifulsoup object and parse webpage data. use soups method findAll to find all links by the a tag. store all links in list.

Using Chrome, go to View > Developer > Developer Tools
In this new developer tools UI, change to the Network tab
Navigate to the place where you would need to click, and click the ⃠ symbol to clear all recent activity.
Click the link, and see if there was any requests made to the server
If there was, click it, and see if you can reverse engineer the API of its endpoint

Please be aware that this may be against the website's Terms of Service!

It appears that BeautifulSoup might be the easiest way to do this. I've made a cursory check that the results of the following script are the same as those that appear on the page. You would just have to write the results to a file, rather than print them. However, the columns are ordered differently.

import requests
from bs4 import BeautifulSoup

URL = 'http://www.nasdaq.com/symbol/amd/historical'
page = requests.get(URL).text
soup = BeautifulSoup(page, 'lxml')
tableDiv = soup.find_all('div', id="historicalContainer")
tableRows = tableDiv[0].findAll('tr')

for tableRow in tableRows[2:]:
    row = tuple(tableRow.getText().split())
    print ('"%s",%s,%s,%s,%s,"%s"' % row)

Output:

"03/24/2017",14.16,14.18,13.54,13.7,"50,022,400"
"03/23/2017",13.96,14.115,13.77,13.79,"44,402,540"
"03/22/2017",13.7,14.145,13.55,14.1,"61,120,500"
"03/21/2017",14.4,14.49,13.78,13.82,"72,373,080"
"03/20/2017",13.68,14.5,13.54,14.4,"91,009,110"
"03/17/2017",13.62,13.74,13.36,13.49,"224,761,700"
"03/16/2017",13.79,13.88,13.65,13.65,"44,356,700"
"03/15/2017",14.03,14.06,13.62,13.98,"55,070,770"
"03/14/2017",14,14.15,13.6401,14.1,"52,355,490"
"03/13/2017",14.475,14.68,14.18,14.28,"72,917,550"
"03/10/2017",13.5,13.93,13.45,13.91,"62,426,240"
"03/09/2017",13.45,13.45,13.11,13.33,"45,122,590"
"03/08/2017",13.25,13.55,13.1,13.22,"71,231,410"
"03/07/2017",13.07,13.37,12.79,13.05,"76,518,390"
"03/06/2017",13,13.34,12.38,13.04,"117,044,000"
"03/03/2017",13.55,13.58,12.79,13.03,"163,489,100"
"03/02/2017",14.59,14.78,13.87,13.9,"103,970,100"
"03/01/2017",15.08,15.09,14.52,14.96,"73,311,380"
"02/28/2017",15.45,15.55,14.35,14.46,"141,638,700"
"02/27/2017",14.27,15.35,14.27,15.2,"95,126,330"
"02/24/2017",14,14.32,13.86,14.12,"46,130,900"
"02/23/2017",14.2,14.45,13.82,14.32,"79,900,450"
"02/22/2017",14.3,14.5,14.04,14.28,"71,394,390"
"02/21/2017",13.41,14.1,13.4,14,"66,250,920"
"02/17/2017",12.79,13.14,12.6,13.13,"40,831,730"
"02/16/2017",13.25,13.35,12.84,12.97,"52,403,840"
"02/15/2017",13.2,13.44,13.15,13.3,"33,655,580"
"02/14/2017",13.43,13.49,13.19,13.26,"40,436,710"
"02/13/2017",13.7,13.95,13.38,13.49,"57,231,080"
"02/10/2017",13.86,13.86,13.25,13.58,"54,522,240"
"02/09/2017",13.78,13.89,13.4,13.42,"72,826,820"
"02/08/2017",13.21,13.75,13.08,13.56,"75,894,880"
"02/07/2017",14.05,14.27,13.06,13.29,"158,507,200"
"02/06/2017",12.46,13.7,12.38,13.63,"139,921,700"
"02/03/2017",12.37,12.5,12.04,12.24,"59,981,710"
"02/02/2017",11.98,12.66,11.95,12.28,"116,246,800"
"02/01/2017",10.9,12.14,10.81,12.06,"165,784,500"
"01/31/2017",10.6,10.67,10.22,10.37,"51,993,490"
"01/30/2017",10.62,10.68,10.3,10.61,"37,648,430"
"01/27/2017",10.6,10.73,10.52,10.67,"32,563,480"
"01/26/2017",10.35,10.66,10.3,10.52,"35,779,140"
"01/25/2017",10.74,10.975,10.15,10.35,"61,800,440"
"01/24/2017",9.95,10.49,9.95,10.44,"43,858,900"
"01/23/2017",9.68,10.06,9.68,9.91,"27,848,180"
"01/20/2017",9.88,9.96,9.67,9.75,"27,936,610"
"01/19/2017",9.92,10.25,9.75,9.77,"46,087,250"
"01/18/2017",9.54,10.1,9.42,9.88,"51,705,580"
"01/17/2017",10.17,10.23,9.78,9.82,"70,388,000"
"01/13/2017",10.79,10.87,10.56,10.58,"38,344,340"
"01/12/2017",10.98,11.0376,10.33,10.76,"75,178,900"
"01/11/2017",11.39,11.41,11.15,11.2,"39,337,330"
"01/10/2017",11.55,11.63,11.33,11.44,"29,122,540"
"01/09/2017",11.37,11.64,11.31,11.49,"37,215,840"
"01/06/2017",11.29,11.49,11.11,11.32,"34,437,560"
"01/05/2017",11.43,11.69,11.23,11.24,"38,777,380"
"01/04/2017",11.45,11.5204,11.235,11.43,"40,742,680"
"01/03/2017",11.42,11.65,11.02,11.43,"55,114,820"
"12/30/2016",11.7,11.78,11.25,11.34,"44,033,460"
"12/29/2016",11.24,11.62,11.01,11.59,"50,180,310"
"12/28/2016",12.28,12.42,11.46,11.55,"71,072,640"
"12/27/2016",11.65,12.08,11.6,12.07,"44,168,130"

The script escapes dates and thousands-separated numbers.

Python- Downloading a file from a webpage by clicking on a link

Tags:

python

html

samz_manu

People also ask

2 Answers

Julien

Bill Bell

Recent Activity

Donate For Us

Python- Downloading a file from a webpage by clicking on a link

Tags:

python

html

samz_manu

People also ask

2 Answers

Julien

Bill Bell

Related questions

Recent Activity

Donate For Us