Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django + Postgres: save JSON string directly into model as JSON type

I have a Django model in which I would like to save schema-less key-value metadata, so I'm using a django.contrib.postgres.fields.JSONField.

The key-value data is a Pandas series, which is not JSON-serializable by default (due to numpy.int64, numpy.float64 types), so I use the handy series.to_json() which returns an already serialized JSON string.

To save it to my model, I run json.loads on it. But I know that json.dumps is called on this when the data is saved to Postgres.

How can I avoid this unnecessary deserialization/re-serialization step?

Code example:

def create_model(pandas_series):
    mdl = Model()
    metadata = pandas_series.to_json()  # gives a JSON string
    mdl.metadata = json.loads(metadata) # string->dict, then dict->string
    mdl.save()
like image 427
glifchits Avatar asked Nov 14 '16 16:11

glifchits


1 Answers

Where is the bottleneck

I created a dictionary with thousand elements programatically it looks something like this:

d = { ... , "a240": 240, "a243": 243, "a242", ...}

Then I json.loads(json.dumps(d)) a 1000 times in a loop. On average the 1000 iterations took 510 milliseconds.

Then I created the following model:

class JsonModel(models.Model):
    jfield = JSONField()

Inserted the dictionary 1000 times into the database as follows:

JsonModel.objects.create(jfield = d)

This operation took 21 seconds on average. So it appears that the real bottleneck is in saving the data to the database rather than in the serialization deserialization step.

So in conclusion, saving to database is roughly 40 times slower than json dumps/loads (but your mileage will vary depending on size of dictionary, load on the db etc).

Avoiding the extra serialize / desrialize steps

It may not be worth the effort to not dumps/loads based on the numbers above. Nevetheless if you are keen to try it, the thing to do is to create a custom field. It can be something as simple as the one below:

from django.contrib.postgres.fields import JSONField
class MyJsonField(JSONField):
    def get_prep_value(self, value):
        if isinstance(value, str):
            return value
        return super(MyJsonField,self).get_prep_value(value)

Then your model can be

class JsonModel(models.Model):
    metadata = MyJsonField()
    ....

And then at the time of creating the db record, you can pass in the json serialized sting instead of calling json.loads

mdl.metadata = metadata # no string->dict, then dict->string
like image 54
e4c5 Avatar answered Sep 27 '22 23:09

e4c5