Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a github file using python returns HTML tags

Tags:

python

I am trying to read a text file saved in github using requests package. Here is the python code I am using:

    import requests
    url = 'https://github.com/...../filename'
    page = requests.get(url)
    print page.text

Instead of getting the text, I am reading HTML tags. How can I read the text from the file instead of HTML tags?

like image 320
Sandy Avatar asked Jul 20 '16 22:07

Sandy


3 Answers

There are some good solutions already, but if you use requests just follow Github's API.

The endpoint for all content is

GET /repos/:owner/:repo/contents/:path

But keep in mind that the default behavior of Github's API is to encode the content using base64.

In your case you would do the following:

#!/usr/bin/env python3
import base64
import requests


url = 'https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'
req = requests.get(url)
if req.status_code == requests.codes.ok:
    req = req.json()  # the response is a JSON
    # req is now a dict with keys: name, encoding, url, size ...
    # and content. But it is encoded with base64.
    content = base64.decodestring(req['content'])
else:
    print('Content was not found.')
like image 113
dasdachs Avatar answered Nov 15 '22 21:11

dasdachs


You can access a text version by changing the beginning of your link to

https://raw.githubusercontent.com/
like image 23
patrick Avatar answered Nov 15 '22 22:11

patrick


Thank you @dasdachs for your answer. However I was getting an error when executing the following line:

content = base64.decodestring(req['content'])

The error I got was:

/usr/lib/python3.6/base64.py in _input_type_check(s)
    511     except TypeError as err:
    512         msg = "expected bytes-like object, not %s" % s.__class__.__name__
--> 513         raise TypeError(msg) from err
    514     if m.format not in ('c', 'b', 'B'):
    515         msg = ("expected single byte elements, not %r from %s" %

TypeError: expected bytes-like object, not str

Hence I replaced it with the below snippet:

content = base64.b64decode(json['content'])

Sharing my working snippet below (executing in Python 3):

import requests
import base64
import json


def constructURL(user = "404",repo_name= "404",path_to_file= "404",url= "404"):
  url = url.replace("{user}",user)
  url = url.replace("{repo_name}",repo_name)
  url = url.replace("{path_to_file}",path_to_file)
  return url

user = '<provide value>'
repo_name = '<provide value>'
path_to_file = '<provide value>'
json_url ='https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'

json_url = constructURL(user,repo_name,path_to_file,json_url) #forms the correct URL
response = requests.get(json_url) #get data from json file located at specified URL 

if response.status_code == requests.codes.ok:
    jsonResponse = response.json()  # the response is a JSON
    #the JSON is encoded in base 64, hence decode it
    content = base64.b64decode(jsonResponse['content'])
    #convert the byte stream to string
    jsonString = content.decode('utf-8')
    finalJson = json.loads(jsonString)
else:
    print('Content was not found.')

for key, value in finalJson.items():
    print("The key and value are ({}) = ({})".format(key, value))
like image 44
qazjvm Avatar answered Nov 15 '22 21:11

qazjvm