I had to switch my public Github repository to private and cannot access files, not with access tokens that I was able to with the public Github repo.
I can access my private repo's CSV with curl: ''' curl -s https://{token}@raw.githubusercontent.com/username/repo/master/file.csv
'''
However, I want to access this information in my python file. When the repo was public I could simply use: ''' url = 'https://raw.githubusercontent.com/username/repo/master/file.csv' df = pd.read_csv(url, error_bad_lines=False)
'''
This no longer works now that the repo is private, and I cannot find a work around to download this CSV in python instead of pulling from terminal.
If I try: ''' requests.get(https://{token}@raw.githubusercontent.com/username/repo/master/file.csv) ''' I get a 404 response, which is basically the same thing that is happening with the pd.read_csv(). If I click on the raw file I see that a temporary token is created and the URL is: ''' https://raw.githubusercontent.com/username/repo/master/file.csv?token=TEMPTOKEN ''' Is there a way to attach my permanent private access token so that I can always pull this data from github?
Under your repository name, click Settings. In the "Access" section of the sidebar, click Collaborators & teams. Click Invite a collaborator. In the search field, start typing the name of person you want to invite, then click a name in the list of matches.
Yes, you may download CSV file in Python instead of pulling from terminal. To achieve that you may use GitHub API v3 with 'requests' and 'io' modules assistance. Reproducible example below.
import numpy as np
import pandas as pd
import requests
from io import StringIO
# Create CSV file
df = pd.DataFrame(np.random.randint(2,size=10_000).reshape(1_000,10))
df.to_csv('filename.csv')
# -> now upload file to private github repo
# define parameters for a request
token = 'paste-there-your-personal-access-token'
owner = 'repository-owner-name'
repo = 'repository-name-where-data-is-stored'
path = 'filename.csv'
# send a request
r = requests.get(
'https://api.github.com/repos/{owner}/{repo}/contents/{path}'.format(
owner=owner, repo=repo, path=path),
headers={
'accept': 'application/vnd.github.v3.raw',
'authorization': 'token {}'.format(token)
}
)
# convert string to StringIO object
string_io_obj = StringIO(r.text)
# Load data to df
df = pd.read_csv(string_io_obj, sep=",", index_col=0)
# optionally write df to CSV
df.to_csv("file_name_02.csv")
This is what ended up working for me - leaving it here if anyone runs into the same issue. Thanks for the help!
import json, requests, urllib, io
user='my_github_username'
pao='my_pao'
github_session = requests.Session()
github_session.auth = (user, pao)
# providing raw url to download csv from github
csv_url = 'https://raw.githubusercontent.com/user/repo/master/csv_name.csv'
download = github_session.get(csv_url).content
downloaded_csv = pandas.read_csv(io.StringIO(download.decode('utf-8')), error_bad_lines=False)
Adding another working example:
import requests
from requests.structures import CaseInsensitiveDict
# Variables
GH_PREFIX = "https://raw.githubusercontent.com"
ORG = "my-user-name"
REPO = "my-repo-name"
BRANCH = "main"
FOLDER = "some-folder"
FILE = "some-file.csv"
URL = GH_PREFIX + "/" + ORG + "/" + REPO + "/" + BRANCH + "/" + FOLDER + "/" + FILE
# Headers setup
headers = CaseInsensitiveDict()
headers["Authorization"] = "token " + GITHUB_TOKEN
# Execute and view status
resp = requests.get(URL, headers=headers)
if resp.status_code == 200:
print(resp.content)
else:
print("Request failed!")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With