Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Access Private Github Repo File (.csv) in Python using Pandas or Requests

I had to switch my public Github repository to private and cannot access files, not with access tokens that I was able to with the public Github repo.

I can access my private repo's CSV with curl: ''' curl -s https://{token}@raw.githubusercontent.com/username/repo/master/file.csv

'''

However, I want to access this information in my python file. When the repo was public I could simply use: ''' url = 'https://raw.githubusercontent.com/username/repo/master/file.csv' df = pd.read_csv(url, error_bad_lines=False)

'''

This no longer works now that the repo is private, and I cannot find a work around to download this CSV in python instead of pulling from terminal.

If I try: ''' requests.get(https://{token}@raw.githubusercontent.com/username/repo/master/file.csv) ''' I get a 404 response, which is basically the same thing that is happening with the pd.read_csv(). If I click on the raw file I see that a temporary token is created and the URL is: ''' https://raw.githubusercontent.com/username/repo/master/file.csv?token=TEMPTOKEN ''' Is there a way to attach my permanent private access token so that I can always pull this data from github?

like image 831
everwitt7 Avatar asked Jun 03 '20 02:06

everwitt7


People also ask

How do I request access to a private repository GitHub?

Under your repository name, click Settings. In the "Access" section of the sidebar, click Collaborators & teams. Click Invite a collaborator. In the search field, start typing the name of person you want to invite, then click a name in the list of matches.


Video Answer


3 Answers

Yes, you may download CSV file in Python instead of pulling from terminal. To achieve that you may use GitHub API v3 with 'requests' and 'io' modules assistance. Reproducible example below.

import numpy as np
import pandas as pd
import requests
from io import StringIO

# Create CSV file
df = pd.DataFrame(np.random.randint(2,size=10_000).reshape(1_000,10))
df.to_csv('filename.csv') 

# -> now upload file to private github repo

# define parameters for a request
token = 'paste-there-your-personal-access-token' 
owner = 'repository-owner-name'
repo = 'repository-name-where-data-is-stored'
path = 'filename.csv'

# send a request
r = requests.get(
    'https://api.github.com/repos/{owner}/{repo}/contents/{path}'.format(
    owner=owner, repo=repo, path=path),
    headers={
        'accept': 'application/vnd.github.v3.raw',
        'authorization': 'token {}'.format(token)
            }
    )

# convert string to StringIO object
string_io_obj = StringIO(r.text)

# Load data to df
df = pd.read_csv(string_io_obj, sep=",", index_col=0)

# optionally write df to CSV
df.to_csv("file_name_02.csv")
like image 154
John Smith Avatar answered Oct 20 '22 11:10

John Smith


This is what ended up working for me - leaving it here if anyone runs into the same issue. Thanks for the help!

    import json, requests, urllib, io

    user='my_github_username'
    pao='my_pao'

    github_session = requests.Session()
    github_session.auth = (user, pao)

    # providing raw url to download csv from github
    csv_url = 'https://raw.githubusercontent.com/user/repo/master/csv_name.csv'

    download = github_session.get(csv_url).content
    downloaded_csv = pandas.read_csv(io.StringIO(download.decode('utf-8')), error_bad_lines=False)
like image 38
everwitt7 Avatar answered Oct 20 '22 12:10

everwitt7


Adding another working example:

import requests
from requests.structures import CaseInsensitiveDict

# Variables
GH_PREFIX = "https://raw.githubusercontent.com"
ORG = "my-user-name"
REPO = "my-repo-name"
BRANCH = "main"
FOLDER = "some-folder"
FILE = "some-file.csv"
URL = GH_PREFIX + "/" + ORG + "/" + REPO + "/" + BRANCH + "/" + FOLDER + "/" + FILE

# Headers setup
headers = CaseInsensitiveDict()
headers["Authorization"] = "token " + GITHUB_TOKEN

# Execute and view status
resp = requests.get(URL, headers=headers)
if resp.status_code == 200:
   print(resp.content)
else:
   print("Request failed!")
like image 1
RtmY Avatar answered Oct 20 '22 11:10

RtmY