Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse HTML from local file

I'm using Google App Engine with Python. I want to get the tree of a HTML file from the same project as my Python script. I tried many things, like using the absolute url (e.g http://localhost:8080/nl/home.html) and the relative url (/nl/home.html). Both don't seem to work. I use this code:

class HomePage(webapp2.RequestHandler):    
    def get(self):

        path = self.request.path

        htmlfile = etree.parse(path)
        template = jinja_environment.get_template('/nl/template.html')

        pagetitle = htmlfile.find(".//title").text
        body = htmlfile.get_element_by_id("body").toString()

It returns the following error: IOError: Error reading file '/nl/home.html': failed to load external entity "/nl/home.html

Does anyone know how to get the tree of a HTML file from the same project with Python?

EDIT

This is the working code:

class HomePage(webapp2.RequestHandler):    
def get(self):

    path = self.request.path.replace("/","",1)
    logging.info(path)

    htmlfile = html.fromstring(urllib.urlopen(path).read())   
    template = jinja_environment.get_template('/nl/template.html')

    pagetitle = htmlfile.find(".//title").text
    body = innerHTML(htmlfile.get_element_by_id("body"))

def innerHTML(node): 
    buildString = ''
    for child in node:
        buildString += html.tostring(child)
    return buildString
like image 744
Simon Avatar asked May 02 '26 15:05

Simon


1 Answers

Your working directory is the base of your app directory. So if your app is organized like:

  • app.yaml
  • nl/
    • home.html

You can then read your file at nl/html.html (assuming you haven't changed your working directory).

like image 69
R Samuel Klatchko Avatar answered May 05 '26 04:05

R Samuel Klatchko