Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.7 on Google App Engine, cannot use lxml.etree

I've been trying to use html5lib with lxml on python 2.7 in google app engine. But when I run the following code, it gives me an error saying "NameError: global name 'etree' is not defined". Is it not possible to use lxml.etree on google app engine? or am I missing something?

app.yaml

application: testsite
version: 1
runtime: python27
api_version: 1
threadsafe: false

handlers:
- url: /.*
  script: index.py   

libraries:
- name: lxml
  version: "2.3"  # I thought this would allow me to use lxml.etree

index.py

from testhandler import TestHandler
application = webapp.WSGIApplication([('/', TestHandler)], debug=True)

testhandler.py

import urllib2
import html5lib
from html5lib import treebuilders
try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        # Python 2.5
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        try:
            # Python 2.5
            import xml.etree.ElementTree as etree
            print("running with ElementTree on Python 2.5+")
        except ImportError:
            try:
                # normal cElementTree install
                import cElementTree as etree
                print("running with cElementTree")
            except ImportError:
                try:
                    # normal ElementTree install
                    import elementtree.ElementTree as etree
                    print("running with ElementTree")
                except ImportError:
                    print("Failed to import ElementTree from any known place")

from google.appengine.ext import webapp

class TestHandler(webapp.RequestHandler):
    def get(self):
        f = urllib2.urlopen("http://www.yahoo.com/").read()
        doc = html5lib.parse(f, treebuilder='lxml')
        elems = doc.xpath("//*[local-name() = 'a']")
        self.response.out.write(len(elems))

error

running with cElementTree on Python 2.5+
Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 769

<pre>Traceback (most recent call last):
  File &quot;/usr/local/bin/google_appengine/google/appengine/ext/webapp/_webapp25.py&quot;,     line 701, in __call__
handler.get(*groups)
  File &quot;/home/test/testhandler.py&quot;, line 38, in get
    parser = html5lib.HTMLParser(tree= treebuilders.getTreeBuilder('lxml'))
  File &quot;/home/test/html5lib/html5parser.py&quot;, line 68, in __init__
    self.tree = tree(namespaceHTMLElements)
  File &quot;/home/test/html5lib/treebuilders/etree_lxml.py&quot;, line 176, in __init__
    builder = etree_builders.getETreeModule(etree, fullTree=fullTree)
NameError: global name 'etree' is not defined
</pre>

ADD

Nah, I tried several ways to create a doc object, but no luck. One of the ways, I tried to import from lxml.html import document_fromstring and that gives me this error.

Traceback (most recent call last):
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4143, in _HandleRequest
    self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch
    base_env_dict=env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch
    base_env_dict=base_env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch
    self._module_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI
    reset_modules = exec_script(handler_path, cgi_path, hook)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript
    exec module_code in script_module.__dict__
  File "/home/yoo/eclipse_workspace/website_checker/src/index.py", line 5, in <module>
    from handlers.updatecheck import UpdateCheckHandler
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
    return self.FindAndLoadModule(submodule, fullname, search_path)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
    description)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
    description)
  File "/home/test/updatecheck.py", line 4, in <module>
    from lxml.html import document_fromstring
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
    return self.FindAndLoadModule(submodule, fullname, search_path)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
    description)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
    description)
  File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 12, in <module>
    from lxml import etree
ImportError: cannot import name etree

According to the error, it seems app engine doesn't allow me to load etree module for some reason. I wanted to use xpath with lxml, but I can't spend much time to figure out what is going on here and don't have enough knowledge of python either. So I would give a try to find a way with 'simpletree' version.

f = urllib2.urlopen("http://www.yahoo.com/").read()
p = html5lib.HTMLParser()
doc = p.parse(f)
# do something with doc.childNodes
self.response.out.write(len(doc.childNodes))  

Not really a good way, but at least it worked when I tested on live google app engine.

like image 813
Yoo Matsuo Avatar asked Nov 15 '11 01:11

Yoo Matsuo


2 Answers

Have you installed lxml locally? I had the same error before - import failed. You can download lxml here: http://pypi.python.org/pypi/lxml/

lxml works with GAE and this is great. But it is a real absence of any documentation or examples about that right now.

like image 67
Artem Yarulin Avatar answered Nov 12 '22 18:11

Artem Yarulin


On Windows, I had this problem and it is due to the fact the python27 distro does not include the lxml. You can use the script easy_install but you will have to compile the source which gave me trouble.

Using this post I found on the Google forums:

https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.python/Q8YeOIbn5Ds

However if you want to save yourself the pain trying to get it to build from source, just install a precompiled binary, for instance the one available from: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

Simply download the executable from the above web site and run the *.exe and it stalls all the code necessary.

like image 1
TheChrisONeil Avatar answered Nov 12 '22 19:11

TheChrisONeil