I am trying to read a text file saved in github using requests package. Here is the python code I am using:
import requests
url = 'https://github.com/...../filename'
page = requests.get(url)
print page.text
Instead of getting the text, I am reading HTML tags. How can I read the text from the file instead of HTML tags?
There are some good solutions already, but if you use requests
just follow Github's API.
The endpoint for all content is
GET /repos/:owner/:repo/contents/:path
But keep in mind that the default behavior of Github's API is to encode the content using base64
.
In your case you would do the following:
#!/usr/bin/env python3
import base64
import requests
url = 'https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'
req = requests.get(url)
if req.status_code == requests.codes.ok:
req = req.json() # the response is a JSON
# req is now a dict with keys: name, encoding, url, size ...
# and content. But it is encoded with base64.
content = base64.decodestring(req['content'])
else:
print('Content was not found.')
You can access a text version by changing the beginning of your link to
https://raw.githubusercontent.com/
Thank you @dasdachs for your answer. However I was getting an error when executing the following line:
content = base64.decodestring(req['content'])
The error I got was:
/usr/lib/python3.6/base64.py in _input_type_check(s)
511 except TypeError as err:
512 msg = "expected bytes-like object, not %s" % s.__class__.__name__
--> 513 raise TypeError(msg) from err
514 if m.format not in ('c', 'b', 'B'):
515 msg = ("expected single byte elements, not %r from %s" %
TypeError: expected bytes-like object, not str
Hence I replaced it with the below snippet:
content = base64.b64decode(json['content'])
Sharing my working snippet below (executing in Python 3):
import requests
import base64
import json
def constructURL(user = "404",repo_name= "404",path_to_file= "404",url= "404"):
url = url.replace("{user}",user)
url = url.replace("{repo_name}",repo_name)
url = url.replace("{path_to_file}",path_to_file)
return url
user = '<provide value>'
repo_name = '<provide value>'
path_to_file = '<provide value>'
json_url ='https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'
json_url = constructURL(user,repo_name,path_to_file,json_url) #forms the correct URL
response = requests.get(json_url) #get data from json file located at specified URL
if response.status_code == requests.codes.ok:
jsonResponse = response.json() # the response is a JSON
#the JSON is encoded in base 64, hence decode it
content = base64.b64decode(jsonResponse['content'])
#convert the byte stream to string
jsonString = content.decode('utf-8')
finalJson = json.loads(jsonString)
else:
print('Content was not found.')
for key, value in finalJson.items():
print("The key and value are ({}) = ({})".format(key, value))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With