I have written some code to scrape BTC/ETH time series from investing.com and it works fine. However I need to alter the requests call so that the downloaded data is from Kraken not the bitfinex default and from 01/06/2016 instead of the default start time. This options can be set manually on the web page but I have no idea how to send that via the requests call except that it may involve using a the "data" parameter. Grateful for any advice.
Thanks,
KM
Code already written in python and works fine for defaults
import requests
from bs4 import BeautifulSoup
import os
import numpy as np
# BTC scrape https://www.investing.com/crypto/bitcoin/btc-usd-historical-data
# ETH scrape https://www.investing.com/crypto/ethereum/eth-usd-historical-data
ticker_list = [x.strip() for x in open("F:\\System\\PVWAVE\\Crypto\\tickers.txt", "r").readlines()]
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
print("Number of tickers: ", len(ticker_list))
for ticker in ticker_list:
print(ticker)
url = "https://www.investing.com/crypto/"+ticker+"-historical-data"
req = requests.get(url, headers=urlheader, data=payload)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="curr_table")
split_rows = table.find_all("tr")
newticker=ticker.replace('/','\\')
output_filename = "F:\\System\\PVWAVE\\Crypto\\{0}.csv".format(newticker)
os.makedirs(os.path.dirname(output_filename), exist_ok=True)
output_file = open(output_filename, 'w')
header_list = split_rows[0:1]
split_rows_rev = split_rows[:0:-1]
for row in header_list:
columns = list(row.stripped_strings)
columns = [column.replace(',','') for column in columns]
if len(columns) == 7:
output_file.write("{0}, {1}, {2}, {3}, {4}, {5}, {6} \n".format(columns[0], columns[2], columns[3], columns[4], columns[1], columns[5], columns[6]))
for row in split_rows_rev:
columns = list(row.stripped_strings)
columns = [column.replace(',','') for column in columns]
if len(columns) == 7:
output_file.write("{0}, {1}, {2}, {3}, {4}, {5}, {6} \n".format(columns[0], columns[2], columns[3], columns[4], columns[1], columns[5], columns[6]))
output_file.close()
Data is downloaded for default exchange and default date range but I want to specify Kraken and default start and end times (01/06/16 and last full day ie always yesterday)
There are lots of websites out there that use something called forms to send data to the server, based on user activity (like log-in pages where you fill your user-name and password) or when you click on a button. Something like that is going on here.
You need to be smart and make 3 changes in your python code.
Change the request from GET to POST.
Send the Form Data as payload for that request.
Change the url to the one you just saw in the Headers tab.
url = "https://www.investing.com/instruments/HistoricalDataAjax"
payload = {'header': 'BTC/USD Kraken Historical Data', 'st_date': '12/01/2018', 'end_date': '12/01/2018', 'sort_col': 'date', 'action': 'historical_data', 'smlID': '145284', 'sort_ord': 'DESC', 'interval_sec': 'Daily', 'curr_id': '49799'}
requests.post(url, data=payload, headers=urlheader)
Make the above mentioned changes and let other parts of your code remain the same. You will get the results you want. You can modify the dates according to your needs too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With