Using Django ORM for processing huge numbers of large records

Question

I have a table containing about 30k records, that I'm attempting to iterate over and process with Django's ORM. Each record stores several binary blobs, which can each be several MB in size, that I need to process and write to a file.

However, I'm having trouble using Django for this because of memory constraints. I have 8GB of memory on my system, but after processing about 5k of records, the Python process is consuming all 8GB and gets killed by the Linux kernel. I've tried various tricks for clearing Django's query cache, like:

periodically calling MyModel.objects.update()
setting settings.DEBUG=False
periodically invoking Python's garbage collector via gc.collect()

However, none of these seem to have any noticeable effect, and my process continues to experience some sort of memory leak until it crashes.

Is there anything else I can do?

Since I only need to process each record one at a time, and I never need to access the same record again in the process, I have no need to save any model instance, or load more than one instance at a time. How do you ensure that only one record is loaded and that Django caches nothing and unallocates all memory immediately after use?

Pavel Daynyak · Accepted Answer

Try to use iterator.

A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can results in better performance and a significant reduction in memory.

It's a quote from django docs: https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator

Using Django ORM for processing huge numbers of large records

Tags:

python

django

django-orm

Cerin

1 Answers

Pavel Daynyak

Recent Activity

Donate For Us

Using Django ORM for processing huge numbers of large records

Tags:

python

django

django-orm

Cerin

1 Answers

Pavel Daynyak

Related questions

Recent Activity

Donate For Us