Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use subquery in django?

Tags:

I want to get a list of the latest purchase of each customer, sorted by the date.

The following query does what I want except for the date:

(Purchase.objects
         .all()
         .distinct('customer')
         .order_by('customer', '-date'))

It produces a query like:

SELECT DISTINCT ON 
    "shop_purchase.customer_id" 
    "shop_purchase.id" 
    "shop_purchase.date" 
FROM "shop_purchase" 
ORDER BY "shop_purchase.customer_id" ASC, 
         "shop_purchase.date" DESC;

I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.

I want to sort by the date, so what the query I really need should look like this:

SELECT * FROM (
  SELECT DISTINCT ON 
      "shop_purchase.customer_id" 
      "shop_purchase.id" 
      "shop_purchase.date" 
  FROM "shop_purchase" 
  ORDER BY "shop_purchase.customer_id" ASC, 
           "shop_purchase.date" DESC;
  )
AS result 
ORDER BY date DESC;

I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.

In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.

Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?

The actual models and methods are several pages long, but here is the set of models required for the queryset above.

class Customer(models.Model):
  user = models.OneToOneField(User)

class Purchase(models.Model):
  customer = models.ForeignKey(Customer)
  date = models.DateField(auto_now_add=True)
  item = models.CharField(max_length=255)

If I have data like:

Customer A - 
    Purchase(item=Chair, date=January), 
    Purchase(item=Table, date=February)
Customer B - 
    Purchase(item=Speakers, date=January), 
    Purchase(item=Monitor,  date=May)
Customer C - 
    Purchase(item=Laptop,  date=March), 
    Purchase(item=Printer, date=April)

I want to be able to extract the following:

Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table,   date=February)

There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.

This query will be able to extract that:

SELECT * FROM (
  SELECT DISTINCT ON 
    "shop_purchase.customer_id" 
    "shop_purchase.id" 
    "shop_purchase.date" 
  FROM "shop_purchase" 
  ORDER BY "shop_purchase.customer_id" ASC, 
           "shop_purchase.date" DESC;
) 
AS result 
ORDER BY date DESC;

I'm trying to find a way not to have to use raw SQL to achieve this result.

like image 661
Eric Avatar asked Jan 15 '13 04:01

Eric


2 Answers

This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.

Here is an example of something that may help:

  from django.db.models import Max
  Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))

This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:

SELECT "demo_customer"."id", 
       "demo_customer"."user_id", 
       MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
         "demo_customer"."user_id"

Another option, would be adding a property to your customer model that would look something like this:

  @property
  def latest_purchase(self):
    return self.purchase_set.order_by('-date')[0]

You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).

I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!

like image 51
Matthew J Morrison Avatar answered Oct 25 '22 13:10

Matthew J Morrison


Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:

SELECT * FROM (
  SELECT DISTINCT ON 
    "shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date" 
  FROM "shop_purchase" 
  ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
  ) AS result 
ORDER BY date DESC;

In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.

If you need to find the last purchase of every customer, you need to do something like:

SELECT  "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1 

But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.

To use IN you need a list of unique parameters to identify a record, e.g., id

If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:

SELECT  max("shop_purchase.id") 
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";

Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.

The complete SQL will now be:

SELECT * 
FROM shop_customer 
WHERE "shop_customer.id" IN 
    (SELECT  max("shop_purchase.id") 
     FROM shop_purchase
     GROUP BY "shop_purchase.customer_id");

and using the Django ORM it looks like:

(Purchase.objects.filter(
    id__in=Purchase.objects
                   .values('customer_id')
                   .annotate(latest=Max('id'))
                   .values_list('latest', flat=True)))

Hope it helps!

like image 44
Ajay Yadav Avatar answered Oct 25 '22 13:10

Ajay Yadav