Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download a csv from url and make it a dataframe python pandas

I am new to python so need a little help here. I have a dataframe with a url column with a link that allows me to download a CSV for each link. My aim is to create a loop/ whatever works so that I can run one command that will allow me to download,read the csv and create a dataframe for each of the rows. Any help would be appreciated. I have attached part of the dataframe below. If the link doesn't work (it probably won't you can just replace it with a link from 'https://finance.yahoo.com/quote/GOOG/history?p=GOOG' (any other company too) and navigate to download csv and use that link.

Dataframe:

Symbol         Link
YI             https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH            https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN           https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS           https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E

Thanks again.

like image 850
cloudly lemons Avatar asked Nov 05 '18 16:11

cloudly lemons


People also ask

Can you create a DataFrame from a CSV file?

Method #3: Using the csv module: One can directly import the csv files using the csv module and then create a data frame using that csv file.

Can Pandas read from URL?

To use Python Pandas read_csv with URL, we can call read_csv directly with a url . to call read_csv with the url with the csv to read it into a data frame.


1 Answers

There are multiple ways to get CSV data from URLs. From your example, namely Yahoo Finance, you can copy the Historical data link and call it in Pandas

...
HISTORICAL_URL = "https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1582781719&period2=1614404119&interval=1d&events=history&includeAdjustedClose=true"

df = pd.read_csv(HISTORICAL_URL)

A general pattern could involve tools like requests or httpx to make a GET|POST request and then get the contents to io.

import pandas as pd
import requests
import io

url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
         'period2':1541443929,
         'interval':'1d',
         'events':'history',
         'crumb':'v4z6ZpmoP98',
        }

r = requests.post(url,data=params)
if r.ok:
    data = r.content.decode('utf8')
    df = pd.read_csv(io.StringIO(data))

To get the params, I just followed the liked and copied everything after ‘?’. Check that they match ;)

Results: enter image description here

Update:


If you can see the raw csv contents directly in url, just pass the url in pd.read_csv Example data directly from url:

data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'

df = pd.read_csv(data_url)
like image 84
Prayson W. Daniel Avatar answered Nov 14 '22 23:11

Prayson W. Daniel