Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subclassing db.TextProperty for storing python dict as JSON and setting default encoding to anything but ASCII

Using Google App Engine (python SDK), I created a custom JSONProperty() as a subclass of db.TextProperty(). My goal is to store a python dict on the fly as JSON and retrieve it easily. I followed various examples found via Google and setting up the custom Property class and methods is pretty easy.

However, some of my dict values (strings) are encoded in utf-8. When saving the model into the datastore, I get a dreaded Unicode error (for datastore text property default encoding is ASCII). Subclassing db.BlobProperty didn't solve the issue.

Basically, my code does the following thing : store Resource entities into the datastore (with URL as a StringProperty and POST/GET payloads stored in a dict as a JSONProperty), fetch them later (code not included). I choose not to use pickle for storing payloads because I'm a JSON freak and have no use storing objects.

Custom JSONProperty :

class JSONProperty(db.TextProperty):
    def get_value_for_datastore(self, model_instance):
        value = super(JSONProperty, self).get_value_for_datastore(model_instance)
        return json.dumps(value)

    def make_value_from_datastore(self, value):
        if value is None:
            return {}
        if isinstance(value, basestring):
            return json.loads(value)
        return value

Putting model into datastore :

res = Resource()
res.init_payloads()
res.url = "http://www.somesite.com/someform/"
res.param = { 'name': "SomeField", 'default': u"éàôfoobarç" }
res.put()

This will throw a UnicodeDecodeError related to ASCII encoding. Maybe it's worth noting that I only get this error (everytime) on production server. I'm using python 2.5.2 on dev.

Traceback (most recent call last): File "/base/data/home/apps/delpythian/1.350065314722833389/core/handlers/ResetHandler.py", line 68, in _res_one return res_one.put() File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/init.py", line 984, in put return datastore.Put(self._entity, config=config) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put return _GetConnection().async_put(config, entities, extra_hook).get_result() File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put for pbs in pbsgen: File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists pb = value_to_pb(value) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb return entity._ToPb() File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb properties = datastore_types.ToPropertyPb(name, values) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb pbvalue = pack_prop(name, v, pb.mutable_value()) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)

My question is the following : is there a way to subclass a db.TextProperty() class and set/enforce a custom encoding ? Or am I doing something wrong ? I try to avoid using str() and follow the "Decode early, Unicode everywhere, encode late" rule.

Update : added code and stacktrace.

like image 473
jbmusso Avatar asked Feb 04 '26 20:02

jbmusso


2 Answers

Here's a minimal example of moving a unicode string from a dictionary to a serialized JSON string to a TextProperty:

class Thing(db.Model):
  json = db.TextProperty()

class MainHandler(webapp.RequestHandler):
  def get(self):
    data = {'word': u"r\xe9sum\xe9"}
    json = simplejson.dumps(data, ensure_ascii=False)
    Thing(json=json).put()

This works for me in both dev and prod.

like image 108
Drew Sears Avatar answered Feb 06 '26 11:02

Drew Sears


Looking at the line: PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii'

it seems that appengine expects all string values to be unicode. the call unicode(value) doesn't specify an encoding so will probably default to ascii unless value is already a unicode, eg:

>>> u = u"ąęćźż"
>>> s = u.encode('utf-8')
>>> unicode(u) # fine
>>> unicode(s, 'utf-8') # fine
>>> unicode(s) # blows up (try's ascii) (on my interpreter)

json.dumps will encode a utf-8 string (by default) and that's why unicode can't handle it.

try this:

>>> return unicode(json.dumps(...), 'utf-8')

and you should be fine.

As for why appengine blows up and your interpreter is fine, my guess would be some local settings, docstring for unicode says it defaults to the current default encoding, which aparently is utf-8 for you and ascii for gae.

like image 40
gzy Avatar answered Feb 06 '26 10:02

gzy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!