I have a Django model in which I would like to save schema-less key-value metadata, so I'm using a django.contrib.postgres.fields.JSONField
.
The key-value data is a Pandas series, which is not JSON-serializable by default (due to numpy.int64, numpy.float64 types), so I use the handy series.to_json()
which returns an already serialized JSON string.
To save it to my model, I run json.loads
on it. But I know that json.dumps
is called on this when the data is saved to Postgres.
How can I avoid this unnecessary deserialization/re-serialization step?
Code example:
def create_model(pandas_series):
mdl = Model()
metadata = pandas_series.to_json() # gives a JSON string
mdl.metadata = json.loads(metadata) # string->dict, then dict->string
mdl.save()
I created a dictionary with thousand elements programatically it looks something like this:
d = { ... , "a240": 240, "a243": 243, "a242", ...}
Then I json.loads(json.dumps(d))
a 1000 times in a loop. On average the 1000 iterations took 510 milliseconds.
Then I created the following model:
class JsonModel(models.Model):
jfield = JSONField()
Inserted the dictionary 1000 times into the database as follows:
JsonModel.objects.create(jfield = d)
This operation took 21 seconds on average. So it appears that the real bottleneck is in saving the data to the database rather than in the serialization deserialization step.
So in conclusion, saving to database is roughly 40 times slower than json dumps/loads (but your mileage will vary depending on size of dictionary, load on the db etc).
It may not be worth the effort to not dumps/loads based on the numbers above. Nevetheless if you are keen to try it, the thing to do is to create a custom field. It can be something as simple as the one below:
from django.contrib.postgres.fields import JSONField
class MyJsonField(JSONField):
def get_prep_value(self, value):
if isinstance(value, str):
return value
return super(MyJsonField,self).get_prep_value(value)
Then your model can be
class JsonModel(models.Model):
metadata = MyJsonField()
....
And then at the time of creating the db record, you can pass in the json serialized sting instead of calling json.loads
mdl.metadata = metadata # no string->dict, then dict->string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With