How to Access Private Github Repo File (.csv) in Python using Pandas or Requests

Tags:

I had to switch my public Github repository to private and cannot access files, not with access tokens that I was able to with the public Github repo.

I can access my private repo's CSV with curl: ''' curl -s https://{token}@raw.githubusercontent.com/username/repo/master/file.csv

'''

However, I want to access this information in my python file. When the repo was public I could simply use: ''' url = 'https://raw.githubusercontent.com/username/repo/master/file.csv' df = pd.read_csv(url, error_bad_lines=False)

'''

This no longer works now that the repo is private, and I cannot find a work around to download this CSV in python instead of pulling from terminal.

If I try: ''' requests.get(https://{token}@raw.githubusercontent.com/username/repo/master/file.csv) ''' I get a 404 response, which is basically the same thing that is happening with the pd.read_csv(). If I click on the raw file I see that a temporary token is created and the URL is: ''' https://raw.githubusercontent.com/username/repo/master/file.csv?token=TEMPTOKEN ''' Is there a way to attach my permanent private access token so that I can always pull this data from github?

831

asked Jun 03 '20 02:06

everwitt7

Video Answer

3 Answers

Yes, you may download CSV file in Python instead of pulling from terminal. To achieve that you may use GitHub API v3 with 'requests' and 'io' modules assistance. Reproducible example below.

import numpy as np
import pandas as pd
import requests
from io import StringIO

# Create CSV file
df = pd.DataFrame(np.random.randint(2,size=10_000).reshape(1_000,10))
df.to_csv('filename.csv') 

# -> now upload file to private github repo

# define parameters for a request
token = 'paste-there-your-personal-access-token' 
owner = 'repository-owner-name'
repo = 'repository-name-where-data-is-stored'
path = 'filename.csv'

# send a request
r = requests.get(
    'https://api.github.com/repos/{owner}/{repo}/contents/{path}'.format(
    owner=owner, repo=repo, path=path),
    headers={
        'accept': 'application/vnd.github.v3.raw',
        'authorization': 'token {}'.format(token)
            }
    )

# convert string to StringIO object
string_io_obj = StringIO(r.text)

# Load data to df
df = pd.read_csv(string_io_obj, sep=",", index_col=0)

# optionally write df to CSV
df.to_csv("file_name_02.csv")

154

answered Oct 20 '22 11:10

John Smith

This is what ended up working for me - leaving it here if anyone runs into the same issue. Thanks for the help!

    import json, requests, urllib, io

    user='my_github_username'
    pao='my_pao'

    github_session = requests.Session()
    github_session.auth = (user, pao)

    # providing raw url to download csv from github
    csv_url = 'https://raw.githubusercontent.com/user/repo/master/csv_name.csv'

    download = github_session.get(csv_url).content
    downloaded_csv = pandas.read_csv(io.StringIO(download.decode('utf-8')), error_bad_lines=False)

answered Oct 20 '22 12:10

everwitt7

Adding another working example:

import requests
from requests.structures import CaseInsensitiveDict

# Variables
GH_PREFIX = "https://raw.githubusercontent.com"
ORG = "my-user-name"
REPO = "my-repo-name"
BRANCH = "main"
FOLDER = "some-folder"
FILE = "some-file.csv"
URL = GH_PREFIX + "/" + ORG + "/" + REPO + "/" + BRANCH + "/" + FOLDER + "/" + FILE

# Headers setup
headers = CaseInsensitiveDict()
headers["Authorization"] = "token " + GITHUB_TOKEN

# Execute and view status
resp = requests.get(URL, headers=headers)
if resp.status_code == 200:
   print(resp.content)
else:
   print("Request failed!")

answered Oct 20 '22 11:10

RtmY

Related questions
                            
                                How to have persistent storage for a PYPI package
                            
                                With a PyTorch LSTM, can I have a different hidden_size than input_size?
                            
                                Rolling apply function must be real number, not Nonetype
                            
                                Removing lower case letter in column of Pandas dataframe
                            
                                can I split numpy array with mask?
                            
                                I need help making a discord py temp mute command in discord py
                            
                                How to fix ValueError: multiclass format is not supported
                            
                                kivy camera application with opencv in android shows black screen
                            
                                How to create a new column for each unique component in a given column of a dataframe in Pandas?
                            
                                How to open a project folder in Spyder IDE?
                            
                                browser_switcher_service.cc(238)] XXX Init() error with Python Selenium Script with Chrome for Web Scraping
                            
                                What is the most Pythonic way of processing messages like this Java "instance-filtering" [RabbitMQ]
                            
                                pandas : pd.concat results in duplicated columns
                            
                                Networkx: how to specify multiple roots for plotting multiple trees at once?
                            
                                Test Pydantic settings in FastAPI
                            
                                Package requires a different Python: 2.7.17 not in '>=3.6.1' while setting up pre-commit
                            
                                How to catch concurrent.futures._base.TimeoutError correctly when using asyncio.wait_for and asyncio.Semaphore?
                            
                                Does it make sense to build a residual network with only fully connected layers (instedad of convolutional layers)?
                            
                                Random number generator with conditions - Python
                            
                                Tensorflow Keras RMSE metric returns different results than my own built RMSE loss function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to Access Private Github Repo File (.csv) in Python using Pandas or Requests

Tags:

git

python

private

pandas

csv

everwitt7

People also ask

Video Answer

3 Answers

John Smith

everwitt7

RtmY

Recent Activity

Donate For Us