Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get million record from django with queryset is slow

I want to iterate all the objects of a table(Post) I am using below code:

posts = Post.objects.all()
for post in posts:
   process_post(post)

process_post is a celery task which will run in background and its not updating post.But the problem I am having is Post table has 1 million records.This is not one time job.I am running it daily.

for post in posts

In above line, Query is called which fetches all the data from DB in one go.

How can I improve its performance? Is there any way by which data is fetched in batches?

like image 969
Himanshu dua Avatar asked Apr 21 '17 10:04

Himanshu dua


1 Answers

Make your own iterator. For Example, say 1 million records.

count = Post.objects.all().count() #1 million
chunk_size = 1000   
for i in range(0, count, chunk_size):
    posts = Post.objects.all()[i:i+chunk_size]
    for post in posts:
        process_post(post)        

Slicing on queryset will play LIMIT, OFFSET usages. Query can decrease as per chunk_size increase where as memory usage also increase. Optimize it for your use case.

like image 167
itzMEonTV Avatar answered Oct 08 '22 15:10

itzMEonTV