Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetching queryset data one by one

I am aware that regular queryset or the iterator queryset methods evaluates and returns the entire data-set in one shot .

for instance, take this :

my_objects = MyObject.objects.all()
for rows in my_objects:            # Way 1
for rows in my_objects.iterator(): # Way 2

Question

In both methods all the rows are fetched in a single-go.Is there any way in djago that the queryset rows can be fetched one by one from database.

Why this weird Requirement

At present my query fetches lets says n rows but sometime i get Python and Django OperationalError (2006, 'MySQL server has gone away').

so to have a workaround for this, i am currently using a weird while looping logic.So was wondering if there is any native or inbuilt method or is my question even logical in first place!! :)

like image 975
NoobEditor Avatar asked Sep 10 '15 06:09

NoobEditor


2 Answers

I think you are looking to limit your query set.

Quote from above link:

Use a subset of Python’s array-slicing syntax to limit your QuerySet to a certain number of results. This is the equivalent of SQL’s LIMIT and OFFSET clauses.

In other words, If you start with a count you can then loop over and take slices as you require them..

cnt = MyObject.objects.count()
start_point = 0
inc = 5
while start_point + inc < cnt:
    filtered = MyObject.objects.all()[start_point:inc]
    start_point += inc

Of course you may need to error handle this more..

like image 195
Sayse Avatar answered Sep 28 '22 23:09

Sayse


Fetching row by row might be worse. You might want to retrieve in batches for 1000s etc. I have used this Django snippet (not my work) successfully with very large querysets. It doesn't eat up memory and no trouble with connections going away.

Here's the snippet from that link:

import gc

def queryset_iterator(queryset, chunksize=1000):
    '''''
    Iterate over a Django Queryset ordered by the primary key

    This method loads a maximum of chunksize (default: 1000) rows in it's
    memory at the same time while django normally would load all rows in it's
    memory. Using the iterator() method only causes it to not preload all the
    classes.

    Note that the implementation of the iterator does not support ordered query sets.
    '''
    pk = 0
    last_pk = queryset.order_by('-pk')[0].pk
    queryset = queryset.order_by('pk')
    while pk < last_pk:
        for row in queryset.filter(pk__gt=pk)[:chunksize]:
            pk = row.pk
            yield row
        gc.collect()
like image 40
e4c5 Avatar answered Sep 28 '22 21:09

e4c5