Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django get min and max value from PostgreSQL specific ArrayField holding IntegerField(s)

I have an ArrayField in my model holding IntegerFields. I'm looking for the best way in which I can extract the smallest and the biggest integer out of this array.

The Django documentation doesn't give examples of something similar and I wonder if the right way would be to find out how and if I can use aggregation function on this ArrayField or if I can cast this to normal python list of integers somehow and use the built-in min, max functions?

Any suggestions for performing this would be helpful. Examples even more.

like image 440
nikitz Avatar asked Feb 16 '21 19:02

nikitz


2 Answers

Python's max and min functions seem to do it.

Model definition:

from django.contrib.postgres.fields import ArrayField

class TestArrayField(models.Model):
    integers = ArrayField(
        models.IntegerField()
    )

Sample data (two TestArrayField model instances, each with an array of IntegerFields):

enter image description here

This returns the min/max for a given instance of TestArrayField:

def test_array_field(request):
    # grabbing the first one, you could use .filter() instead
    integers_array = TestArrayField.objects.first().integers
    max_integer = max(integers_array)
    min_integer = min(integers_array)

    # printing output..
    print(integers_array)
    print('max_integer:', max_integer)
    print('min_integer:', min_integer)

And here's the output:

[6, 8, 10, 12]
max_integer: 12
min_integer: 6
like image 139
Mike Avatar answered Oct 16 '22 15:10

Mike


This is an example from scratch to a working Django with PostgreSQL in reproducible Docker containers, as well as the min/max via various approaches in case there's no reproducible/helpful example found. I just had similar thing on my mind, so I've created it just for fun and feel free to use it if it helps. :)

Starting with a Docker Compose I create 2 containers - one for Django, one for PostgreSQL and set some defaults for it such as password, connectivity, etc as well as mount the current working directory to overlay the filesystem to be able to edit on demand without rebuilding.

Example docker-compose.yml for testing:

version: "3.7"
services:
    my-postgres:
        image: postgres:alpine
        environment:
            POSTGRES_PASSWORD: password
    django:
        build:
            context: "."
            dockerfile: Dockerfile
        environment:
            ENGINE: "django.db.backends.postgresql_psycopg2"
            USER: postgres
            PASSWORD: password
            HOST: "my-postgres"
            PORT: 5432
            NAME: postgres
        volumes:
            - .:/tmp

For the client there is necessary to have a compiler (unless you have a different package), headers, shared objects (libraries). For Alpine you also need musl-dev for limits.h and other headers + shared objects.

Used Dockerfile:

FROM python:alpine
RUN apk add postgresql postgresql-dev gcc musl-dev
WORKDIR /tmp
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . ./
WORKDIR /tmp/app
ENTRYPOINT python manage.py runserver

requirements.txt

asgiref==3.3.1
Django==3.1.6
pytz==2021.1
sqlparse==0.4.1
psycopg2==2.8.6

Tree

in the current directory the django-admin command was issued, that's why you see app/app structure which might seem weird. That's because of the naming of the project in django-admin command as app.

.
├── app
│   ├── app
│   │   ├── asgi.py
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   ├── manage.py
│   └── pg
│       ├── admin.py
│       ├── apps.py
│       ├── __init__.py
│       ├── migrations
│       │   ├── 0001_initial.py
│       │   └── __init__.py
│       ├── models.py
│       ├── tests.py
│       └── views.py
├── docker-compose.yml
├── Dockerfile
└── requirements.txt

How to run:

docker-compose build
docker-compose up -d
docker-compose logs -f                            # to check the logs
docker-compose exec django python manage.py shell # to play in REPL
docker-compose exec django sh                     # to play in shell

Modified files:

settings.py:

# modified
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'pg'
]

from os import environ
DATABASES = {
    'default': {
        item: environ.get(item)
        for item in ["ENGINE", "NAME", "USER", "PASSWORD", "HOST", "PORT"]
    }
}
# modified end

pg application

it is a separate Django application created from scratch in an empty Django project via the created manage.py file:

django-admin startproject app
cd app
python manage.py startapp pg

# edit models.py before migrations step (of course)
python manage.py makemigrations pg
python manage.py migrate

pg/models.py:

from django.db import models
from django.contrib.postgres.fields import ArrayField

class MyModel(models.Model):
    array = ArrayField(models.IntegerField())

    def __str__(self):
        return f"<My Model ({self.id})>: {self.array}"

Insert

is the same as if an ordinary Django model:

>>> from pg import models
>>> models.MyModel.objects.all()
<QuerySet []>
>>> models.MyModel.objects.create(array=[1,5,9])
<MyModel: MyModel object (1)>
>>> models.MyModel.objects.create(array=[2,1,0])
<MyModel: MyModel object (2)>
>>> models.MyModel.objects.all()
<QuerySet [<MyModel: <My Model (1)>: [1, 5, 9]>, <MyModel: <My Model (2)>: [2, 1, 0]>]>

Min/Max

can be done in various ways depending on whether you want to leverage Python or the aggregating

>>> # single object min/max, uses Django i.e. your machine
>>> min(models.MyModel.objects.get(id=1).array)
1
>>> max(models.MyModel.objects.get(id=1).array)
9
>>> # sum min/max, same machine
>>> min([sum(item.array) for item in models.MyModel.objects.all()])
3
>>> max([sum(item.array) for item in models.MyModel.objects.all()])
15

Aggregating

>>> from django.db.models import Min, Max
>>> models.MyModel.objects.aggregate(Min("array"))
{'array__min': [1, 5, 9]}
>>> models.MyModel.objects.aggregate(Max("array"))
{'array__max': [2, 1, 0]}

Executing over the array items seems rather problematic through Django itself, but with a little bit of a raw query there's no problem (use min() for minimum):

from django.db.models.expressions import RawSQL
>>> models.MyModel.objects.annotate(max=RawSQL(
    "select (select max(x) from unnest(pg_mymodel.array) x)",
    params=[]
)).values("max")
<QuerySet [{'max': 9}, {'max': 2}]>

Even with single items:

>>> models.MyModel.objects.filter(id=2).annotate(max=RawSQL(
    "select (select max(x) from unnest(pg_mymodel.array) x)",
    params=[]
)).values("max")
<QuerySet [{'max': 2}]>

Depending on your use case it might be cheaper/faster to leverage PostgreSQL to do the heavy lifting (prepare data) or just to pull the raw data without much editing from the DB as "raw" and then e.g. via multiprocessing process the data into a shape you desire leveraging your machine for the heavy computing.

And if you really dislike even raw queries, you can create your own aggregating Expression. Though sometimes it's just easier/faster to either create a SQL view (can be created via a migration) with pre-made data, build a model for that and pull data via ORM. Use the best tool for the job, but don't spend too much time on it if not required for performance.

like image 3
Peter Badida Avatar answered Oct 16 '22 15:10

Peter Badida