I have the following models:
class LocationPoint(models.Model):
latitude = models.DecimalField(max_digits=16, decimal_places=12)
longitude = models.DecimalField(max_digits=16, decimal_places=12)
class Meta:
unique_together = (
('latitude', 'longitude',),
)
class GeoLogEntry(models.Model):
device = models.ForeignKey(Device, on_delete=models.PROTECT)
location_point = models.ForeignKey(LocationPoint, on_delete=models.PROTECT)
recorded_at = models.DateTimeField(db_index=True)
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
I have lots of incoming records to create (probably thousands at once).
Currently I create them like this:
# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
points_models = map(lambda point: LocationPoint(latitude=latitude, longitude=longitude), points)
LocationPoint.objects.bulk_create(
points_models,
ignore_conflicts=True
)
# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
geo_log_entries = map(
lambda log_entry: GeoLogEntry(device=device, location_point=LocationPoint.objects.get(latitude=latitude, longitude=longitude), recorded_at=log_entry.recorded_at),
log_entries
)
GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)
But I think it's not very effective because it runs N SELECT queries for N records. Is there a better way to do that?
I use Python 3.9, Django 3.1.2 and PostgreSQL 12.4.
The main problem is to fetch the objects to link to in bulk to. We can fetch the objects in bulk once we stored all of these objects:
from django.db.models import Q
points_models = [
LocationPoint(latitude=point.latitude, longitude=point.longitude)
for point in points
]
LocationPoint.objects.bulk_create(
points_models,
ignore_conflicts=True
)
qfilter = Q(
*[
Q(('latitude', point.latitude), ('longitude', point.longitude))
for point in log_entries
],
_connector=Q.OR
)
data = {
(lp.longitude, lp.latitude): lp.pk
for lp in LocationPoint.objects.filter(qfilter)
}
geo_log_entries = [
GeoLogEntry(
device=entry.device,
location_point_id=data[entry.longitude, entry.latitude],
recorded_at=entry.recorded_at
)
for entry in log_entries
]
GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)
We thus fetch all the objects in bulk that we need to link to (with one query thus), make a dictionary that maps the longitude and latitude on the primary key, and then set location_point_id to that point.
It is however important that one uses decimals, or at least a type that will match. Floating points are tricky, since these can easily have rounding errors (therefore often longitudes and latitudes are stored as "fixed point" numbers, so for example integers that are a factor 1'000 larger or 1'000'000 larger). Otherwise you should use an algorithm that matches it with the data that is generated through querying.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With