Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django - distinct rows/objects distinguished by date/day from datetime field

I'v searched quite a while now and know about several answers on sof but none of the solutions does work at my end even if my problem is pretty simple:

What I need (using postgres + django 1.10): I have many rows with many duplicate dates (=days) within a datetime field. I want a queryset containing one row/object each date/day.

fk | col1 | colX | created (type: datetime)
----------------------------------------------
1  | info | info | 2016-09-03 08:25:52.142617+00:00 <- get it (time does not matter)
1  | info | info | 2016-09-03 16:26:52.142617+00:00
2  | info | info | 2016-09-03 11:25:52.142617+00:00
1  | info | info | 2016-09-14 16:26:52.142617+00:00 <- get it (time does not matter)
3  | info | info | 2016-09-14 11:25:52.142617+00:00
1  | info | info | 2016-09-25 23:25:52.142617+00:00 <- get it (time does not matter)
1  | info | info | 2016-09-25 16:26:52.142617+00:00
1  | info | info | 2016-09-25 11:25:52.142617+00:00
2  | info | info | 2016-09-25 14:27:52.142617+00:00
2  | info | info | 2016-09-25 16:26:52.142617+00:00
3  | info | info | 2016-09-25 11:25:52.142617+00:00
etc.

Whats the best (performance + pythionic/django) way to do this. My model/table is going to have many rows (>million).

EDIT 1

The results must be filtered by a fk (e.g. WHERE fk = 1) first.

I already tried the most obvious things such as

MyModel.objects.filter(fk=1).order_by('created__date').di‌​stinct('created__dat‌​e') 

but got following error:

django.core.exceptions.FieldError: Cannot resolve keyword 'date' into field. Join on 'created' not permitted.

...same error with all() and respective ordering through class Meta instead of query-method order_by()...

Does somebody maybe know more about this error in this specific case?

like image 852
Schmalitz Avatar asked Oct 13 '16 11:10

Schmalitz


1 Answers

It doesn't seem to be possible given the current Django implementation, as this would involve using advanced DB backend functions (like Postgres window functions).

The closest thing you've got is to use aggregations :

MyModel.objects.annotate(
    created_date=TruncDay('created')
).values('created_date').annotate(id=Min('id'))

This would aggregate over the similar dates, and pick-up the minimal id.

[{'created_date': datetime.date(2017, 3, 16), 'id': 146},
 {'created_date': datetime.date(2017, 3, 28), 'id': 188},
 {'created_date': datetime.date(2017, 3, 24), 'id': 178},
 {'created_date': datetime.date(2017, 3, 23), 'id': 171},
 {'created_date': datetime.date(2017, 3, 22), 'id': 157}] ...

If you need the whole objects, you can chain this with a .values_list() and another query set, which would result in a subquery:

MyModel.objects.filter(
    id__in=MyModel.objects.annotate(
        created_date=TruncDay('created')
    ).values('created_date').annotate(id=Min('id')).values_list(
        'id', flat=True
    )
)

FYI this results in the following query

SELECT
    "myapp_mymodel"."id",
    "myapp_mymodel"."created",
    "myapp_mymodel"."col1",
    "myapp_mymodel"."colX"
FROM "myapp_mymodel"
WHERE "myapp_mymodel"."id" IN (
    SELECT MIN(U0."id") AS "id"
    FROM "myapp_mymodel" U0
    GROUP BY DATE(U0."created")
)
like image 98
Antwan Avatar answered Sep 21 '22 10:09

Antwan