I was trying to sample a few records from my queryset for performance like:
from random import sample
from my_app import MyModel
my_models = MyModel.objects.all()
# sample only a few of records for performance
my_models_sample = sample(my_models, 5)
for model in my_models_sample:
model.some_expensive_calculation
But I felt like it made only worse in terms of execution time.
How does random.sample()
actually works behind the scene? And will it be rather performance burden on django querysets?
Use bulk query. Use bulk queries to efficiently query large data sets and reduce the number of database requests. Django ORM can perform several inserts or update operations in a single SQL query. If you're planning on inserting more than 5000 objects, specify batch_size.
Since random.sample()
will force evaluate queryset my_models
, the execution time of your program will heavily depend on the total number of MyModel
objects in your database.
To improve performance and avoid loading entire query set into memory, you may end up implementing your own sampling function as described here together with .iterator()
method.
Alternatively, you can also rely on database server to do the sampling for you via order_by('?')
as follows:
MyModel.objects.order_by('?')[:5]
Personally, I wouldn't recommend the latter one as queries may be expensive and slow, depending on the database backend you’re using. (especially for MySQL)
Why not let the database do the shuffling and limiting and compare the times?
MyModel.objects.order_by('?')[:5]
Although the documentation states that this may be expensive, in your case as you are fetching all the rows anyway, I suspect there will be a difference. The magnitude of the difference will depend on how big the data set is (and of course, your database backend).
You are using random.sample()
on a QuerySet object.
If you actually want to get 5 random samples as QuerySet then you can rather use this
random_objects = MyModel.objects.all().order_by('?')[:5]
This will get you 5 random objects and reduce your time of sampling.
PS: I will also check why is it taking so long that random.sample()
is taking so much time for that operation, if ofcourse I find something. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With