I have data in the database which needs updating peridocially. The source of the data returns everything that's avalible at that point in time, so will include new data that is not already in the database.
As I loop through the source data I don't want to be making 1000s of individual writes if possible.
Is there anything such as update_or_create
but works in batches?
One thought was using update_or_create
in combination with manual transactions, but I'm not sure if that just queues up the individual writes or if it would combine it all into one SQL insert?
Or similarly could using @commit_on_success()
on a function with update_or_create
inside a the loop work?
I am not doing anything with the data other than translating it and saving it to a model. Nothing is dependant on that model existing during the loop
Since Django added support for bulk_update, this is now somewhat possible, though you need to do 3 database calls (a get, a bulk create, and a bulk update) per batch. It's a bit challenging to make a good interface to a general purpose function here, as you want the function to support both efficient querying as well as the updates. Here is a method I implemented that is designed for bulk update_or_create where you have a number of common identifying keys (which could be empty) and one identifying key that varies among the batch.
This is implemented as a method on a base model, but can be used independently of that. This also assumes that the base model has an auto_now
timestamp on the model named updated_on
; if this is not the case, the lines of the code that assume this have been commented for easy modification.
In order to use this in batches, chunk your updates into batches before calling it. This is also a way to get around data that can have one of a small number of values for a secondary identifier without having to change the interface.
class BaseModel(models.Model): updated_on = models.DateTimeField(auto_now=True) @classmethod def bulk_update_or_create(cls, common_keys, unique_key_name, unique_key_to_defaults): """ common_keys: {field_name: field_value} unique_key_name: field_name unique_key_to_defaults: {field_value: {field_name: field_value}} ex. Event.bulk_update_or_create( {"organization": organization}, "external_id", {1234: {"started": True}} ) """ with transaction.atomic(): filter_kwargs = dict(common_keys) filter_kwargs[f"{unique_key_name}__in"] = unique_key_to_defaults.keys() existing_objs = { getattr(obj, unique_key_name): obj for obj in cls.objects.filter(**filter_kwargs).select_for_update() } create_data = { k: v for k, v in unique_key_to_defaults.items() if k not in existing_objs } for unique_key_value, obj in create_data.items(): obj[unique_key_name] = unique_key_value obj.update(common_keys) creates = [cls(**obj_data) for obj_data in create_data.values()] if creates: cls.objects.bulk_create(creates) # This set should contain the name of the `auto_now` field of the model update_fields = {"updated_on"} updates = [] for key, obj in existing_objs.items(): obj.update(unique_key_to_defaults[key], save=False) update_fields.update(unique_key_to_defaults[key].keys()) updates.append(obj) if existing_objs: cls.objects.bulk_update(updates, update_fields) return len(creates), len(updates) def update(self, update_dict=None, save=True, **kwargs): """ Helper method to update objects """ if not update_dict: update_dict = kwargs # This set should contain the name of the `auto_now` field of the model update_fields = {"updated_on"} for k, v in update_dict.items(): setattr(self, k, v) update_fields.add(k) if save: self.save(update_fields=update_fields)
Example usage:
class Event(BaseModel): organization = models.ForeignKey(Organization) external_id = models.IntegerField() started = models.BooleanField() organization = Organization.objects.get(...) updates_by_external_id = { 1234: {"started": True}, 2345: {"started": True}, 3456: {"started": False}, } Event.bulk_update_or_create( {"organization": organization}, "external_id", updates_by_external_id )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With