I'm learning Django and its ORM data access methodology and there is something that I'm curious about. In one particular endpoint, I'm making a number of database calls (to Postgres) - below is an example of one:
projects = Project.objects\
            .filter(Q(first_appointment_scheduled=True) | (Q(active=True) & Q(phase=ProjectPhase.meet.value)))\
            .select_related('customer__first_name', 'customer__last_name',
                            'lead_designer__user__first_name', 'lead_designer__user__last_name')\
            .values('id')\
            .annotate(project=F('name'),
                      buyer=Concat(F('customer__first_name'), Value(' '), F('customer__last_name')),
                      designer=Concat(F('lead_designer__user__first_name'), Value(' '), F('lead_designer__user__last_name')),
                      created=F('created_at'),
                      meeting=F('first_appointment_date'))\
            .order_by('id')[:QUERY_SIZE]
As you can see, that's not a small query - I'm pulling in a lot of specific, related data and doing some string manipulation. I'm relatively concerned with performance so I'm doing the best I can to make things more efficient by using select_related() and values() to only get exactly what I need. 
The question I have is, conceptually and in broad terms, at what point does it become faster to just write my queries using parameterized SQL instead of using the ORM (since the ORM has to first "translate" the above "mess")? At what approximate level of query complexity should I switch over to raw SQL?
Any insight would be helpful. Thanks!
Long story short: for anything but the most basic of queries, ORMs require multiple queries to fetch all the data you need. This, more than anything else, is why ORMs are perceptually slower than raw SQL. Most existing benchmarks use flat CRUD queries that are too simplistic to capture this unfortunate characteristic.
This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.
The Django ORM provides many tools to express queries without writing raw SQL. For example: The QuerySet API is extensive. You can annotate and aggregate using many built-in database functions. Beyond those, you can create custom query expressions.
The question I have is, conceptually and in broad terms, at what point does it become faster to just write my queries using parameterized SQL instead of using the ORM (since the ORM has to first "translate" the above "mess")?
If you are asking about performance, Never.
The time taken to convert the ORM query into SQL will be very small compared to the time taken to actually execute that query. Brain cells are irreplaceable, servers are cheap.
If you are really do have performance issues the first place to look at is the your indexes in your models. Try printing out each of the queries generated by the ORM and run them in your psql console by prefixing EXPLAIN ANALYSE.
You can also use the django-debug-toolbar to automate this. In fact django-debug toolbar is an essential tool to hunt down bottlenecks. You will be surprised to note how often you have missed a simple select_related and how that causes hundreds of additional queries to be executed.
At what approximate level of query complexity should I switch over to raw SQL?
if you are asking about the ease of coding, it depends.
If the query is very very hard to write using the ORM and it's unreadable, yes, then it's perfectly fine to use a raw query. For example a query that has multiple aggregations, uses common table expressions, multiple joins etc can sometimes be hard to write as an ORM query, in that case if you are comfortable with raw sql writing it that way is fine.
Agreed with what @e4c5 said .
Additional translation layer for converting an ORM query to raw SQL query will effect performance.
However, this effect will depend on how much complex your query is?
When you use ORM, you can control the load on DB by increasing the processing in the app. In addition, this gives the opportunity to cache the result in the application itself.
At last, It totally depends on your schema , how complex your queries can be and how are you scaling your DB(Indices, replicas etc .)
For more read here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With