I am trying to use BeautifulSoup v4 to parse a document. I call BeautifulSoup on note.content, which is a string returned by Evernote's API:
soup = BeautifulSoup(note.content)
I have enabled lxml in my app.yaml file:
libraries:
- name: lxml
version: "2.3"
Note that this works on my local development server. However, when deployed to Google's cloud I get the following error:
Error Trace:
Unicode parsing is not supported on this platform
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/controller/blog.py", line 101, in get
soup = BeautifulSoup(note.content)
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 168, in __init__
self._feed()
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 181, in _feed
self.builder.feed(self.markup)
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/builder/_lxml.py", line 62, in feed
self.parser.feed(markup)
File "parser.pxi", line 1077, in lxml.etree._FeedParser.feed (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:76196)
ParserError: Unicode parsing is not supported on this platform
UPDATE:
I checked out parser.pxi, and I found these lines of code which generated the error:
elif python.PyUnicode_Check(data):
if _UNICODE_ENCODING is NULL:
raise ParserError, \
u"Unicode parsing is not supported on this platform"
I think there must be something about GAE's deployement environment which causes this error, but I am not sure what.
UPDATE 2:
Because BeautifulSoup will automatically fall back on other parsers, I ended up removing lxml from my application entirely. Doing so fixed the problem.
Try to parse utf-8 string instead of unicode.
As of May 2012 this bug is still present in production, but not in the SDK (1.6.6).
However, rolling back to bs3 bypasses it on production for the time being:
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With