I am going to convert a Django QuerySet to a pandas DataFrame
as follows:
qs = SomeModel.objects.select_related().filter(date__year=2012) q = qs.values('date', 'OtherField') df = pd.DataFrame.from_records(q)
It works, but is there a more efficient way?
Well, Pandas is more performant because it's operating on data in memory. Django ORM is not as performant because it's operating on data that's loaded in over a network. With that said, although they are both operating on data, they solve different problems and could even be used in a complementary way.
A QuerySet represents a collection of objects from your database. It can have zero, one or many filters. Filters narrow down the query results based on the given parameters. In SQL terms, a QuerySet equates to a SELECT statement, and a filter is a limiting clause such as WHERE or LIMIT .
In this tutorial, you will learn how to use pandas in Django data. And convert a query set of data into a Data frame. Like how you convert a CSV data file into a Data Frame. And perform the data science operation right away in Django Views.
Django-filter is a generic, reusable application to alleviate writing some of the more mundane bits of view code. Specifically, it allows users to filter down a queryset based on a model's fields, displaying the form to let them do this. Adding a FilterSet with filterset_class. Using the filterset_fields shortcut.
import pandas as pd import datetime from myapp.models import BlogPost df = pd.DataFrame(list(BlogPost.objects.all().values())) df = pd.DataFrame(list(BlogPost.objects.filter(date__gte=datetime.datetime(2012, 5, 1)).values())) # limit which fields df = pd.DataFrame(list(BlogPost.objects.all().values('author', 'date', 'slug')))
The above is how I do the same thing. The most useful addition is specifying which fields you are interested in. If it's only a subset of the available fields you are interested in, then this would give a performance boost I imagine.
Django Pandas solves this rather neatly: https://github.com/chrisdev/django-pandas/
From the README:
class MyModel(models.Model): full_name = models.CharField(max_length=25) age = models.IntegerField() department = models.CharField(max_length=3) wage = models.FloatField() from django_pandas.io import read_frame qs = MyModel.objects.all() df = read_frame(qs)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With