Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lateral Join in django queryset (in order to use jsonb_to_recordset postgresql function)

I have a model "myModel" saving some data in a (postgresql) jsonField named "json", The typical structure of the json data is: {key:[{"a":1, "b":2}, {"a":3, "b":4}]}.

I would like to filter myModel queryset according to the values of "a" or "b". I may also want to aggregate over "a" or "b"

So "unnesting" the (json -> key) array would be very appreciated, but I can't figure out how to do this with django api.

I have try to perform the "unnesting" straight in postgresql via the following SQL query.

SELECT * 
FROM "myModel"
join lateral jsonb_to_recordset("myModel"."json" -> 'key') as r("a" int, "b" int) on true
LIMIT 5

We can even make it more compact using shortcut notation for the lateral join

SELECT * 
FROM "myModel", jsonb_to_recordset("myModel"."json" -> 'key') as r("a" int, "b" int)
LIMIT 5

But I have no idea how to do something equivalent using the django API. I've tried a few things with annotate and RawSQL, but not of them seems to act on the "FROM" clause. Which is the place where I should actually add the 'jsonb_to_recordset' statement. I could probably use the raw function to place my raw SQL, but that would mean I can't "filter" or "aggregate" on the joined quesryset using the django API.... I'd have to do everything in the rawSQL which is not very convenient for what I have to do.

Another approach, would be to use the queryset "extra" function that allows to add an additional table in the sql FROM clause. Unfortunately, if I do:

qs = myModel.objects.all()
qs = qs.extra(tables = ["""jsonb_to_recordset("myApp_myModel"."json" -> 'key') as r("a" int, "b" int)"""])
qs = qs.values()
print(qs.query)

I get the query django would execute:

SELECT * 
FROM "myModel", "jsonb_to_recordset("myModel"."json" -> 'key') as r("a" int, "b" int)"

That's pretty close to what I need... except that django has added extra quotes around the extra "table" name I provided... So the function doesn't work anymore.

Any idea of how to handle this?

Thanks in advance, Loic

like image 945
user2357068 Avatar asked Sep 25 '19 09:09

user2357068


1 Answers

Even though it's almost 6 months late since you asked about this problem, trying to add my contribution to your problem.

While recently I had been researching on POSTGRESQL Jsonb functions to be used using Django ORM for my purpose to build a powerful Reporting Engine library for Django, I searched and landed to your problem statement. And it seemed that I got stuck exactly on this 100% same problem. How could I use JSON array's to apply aggregation/annotation functions for easy reporting to front-end and show graphs and tables?

Finally after 3 days of trial and errors, head-banging around study room walls, I could find a way to resolve it.

It might not be an ideal way, but I tried to keep it as ideal possible. Also tried to avoid any kind of possible SQL injections. Let's not bore our times with "talking my-my things".


Jumping into the implementation below right now:

from django.db.models.constants import LOOKUP_SEP
from django.db.models.sql.datastructures import BaseTable


class JsonbFunction:
    JSONB_TO_RECORDSET = ("jsonb_to_recordset", '#>')


class JsonbFunctionTable(BaseTable):
    jsonb_join_type = ("JOIN LATERAL", "ON TRUE")
    function_name = None
    function_alias = None

    def __init__(self, table_name, function_name, field_name, columns):
        field_name_seq = field_name.split(LOOKUP_SEP)
        self.model_field = field_name_seq[0]
        self.json_path = field_name_seq[1:]
        alias = self.model_field + 's'
        super(JsonbFunctionTable, self).__init__(table_name=table_name, alias=alias)
        self.function_name = function_name
        column_definitions = list()
        for _c in columns:
            column_definitions.append('{c_name} {c_type}'.format(c_name=_c[0], c_type=_c[1]))
        self.function_alias = '{alias}({column_definitions})'.format(
            alias=alias, column_definitions=','.join(column_definitions))

    def as_sql(self, compiler, connection):
        return "{join} {function}({table}.{field} {sign} '{json_path}') {f_alias} {condition}".format(
            join=self.jsonb_join_type[0],
            condition=self.jsonb_join_type[1],
            f_alias=self.function_alias,
            function=self.function_name[0],
            sign=self.function_name[1],
            table=compiler.quote_name_unless_alias(self.table_name),
            field=compiler.quote_name_unless_alias(self.model_field),
            json_path='{' + ",".join(self.json_path) + '}'
        ), []

    def relabeled_clone(self, change_map):
        return self.__class__(self.table_name, change_map.get(self.table_alias, self.table_alias))

    def equals(self, other, with_filtered_relation):
        return (
                isinstance(self, other.__class__) and
                self.table_name == other.table_name and
                self.table_alias == other.table_alias and
                self.function_name == other.function_name and
                self.join_type == other.join_type
        )

Explanation:

For jsonb_to_recordset function, it generates a table internally and to get what we want, we have to join this function's returning table with every corresponding row of our Django model. To do that the only option is to allow a LATERAL JOIN with jsonb_to_recordset returned table.

JsonbFunctionTable extends tha BaseTable class that will allow you to push a custom TABLE JOIN with your django queryset.


Now let's jump into the next code segment:

from django.contrib.postgres.fields import JSONField
from django.db.models import F, CharField, Expression
from django.db.models.constants import LOOKUP_SEP
from django.utils.deconstruct import deconstructible


class ForceColumn(Expression):
    """
    Represents the SQL of a column name without the table name.

    This variant of Col doesn't include the table name (or an alias) to
    avoid a syntax error in check constraints.
    """
    contains_column_references = True

    def __init__(self, target, output_field=None):
        if output_field is None:
            output_field = target
        super().__init__(output_field=output_field)
        self.target = target

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self.target)

    def as_sql(self, compiler, connection):
        return self.target.db_column, []

    def get_group_by_cols(self):
        return [self]

    def get_db_converters(self, connection):
        if self.target == self.output_field:
            return self.output_field.get_db_converters(connection)
        return (
                self.output_field.get_db_converters(connection) +
                self.target.get_db_converters(connection)
        )


@deconstructible
class ForceF(F):
    model = None
    json_field = None

    def __init__(self, model, name, json_field):
        super(ForceF, self).__init__(name)
        self.model = model
        self.json_field = json_field

    def resolve_expression(self, query=None, allow_joins=True, reuse=None,
                           summarize=False, for_save=False, simple_col=False):
        _field_ref = self.json_field + 's.' + self.name.replace(LOOKUP_SEP, '.')
        path, final_field, targets, rest = query.names_to_path(
            [self.json_field], query.get_meta(), query.get_initial_alias())
        if not isinstance(final_field, JSONField):
            raise Exception('`ForeF` only available for JSON Fields')
        _dummy_field = CharField(db_column=_field_ref, name=_field_ref)
        _dummy_field.model = self.model
        return ForceColumn(_dummy_field, _dummy_field)

Explanation

Here I have defined a ForceF expression function extending django's F. Reason of doing this is that, in Query you will have to select/order_by/group_by with dynamically generated JSONField's "path to your value". But as expected that Django won't allow you to do that because Django will try to validate with your "dynamically generated JSON COLUMN Name" with regular available Django fields. That will give you another blow of blockage. And ForeceF is here to resolve that issue. It also takes care of intentionally possible SQL Injection by forcing rules of naming convention for generated table.


Here is how you are going to finally use with your Django queryset:

results = myModel.objects.filter()
# Adding extra JOIN for JSONb
join_config = JsonbFunctionTable(
    table_name=myModel._meta.db_table,
    function_name=JsonbFunction.JSONB_TO_RECORDSET,
    field_name='json__key', # Your JSON Field and path to that array
    columns=[('a', 'int'), ('b', 'int')]
)
results.query.join(join=join_config)
# Group by as your need
results = results.values('group_by_somethings')
# Now finally force annotate with your dynamically generated JSON Columns
results = results.annotate(
    jsonb_annotated=ForceF(model=myModel, name='a', json_field='json')
)
results = results.values("value_1", "value_2", "jsonb_annotated")
# Check generated query
print(results.query)
# Check generated result
print(results.query)

Footnote:

This is something I just build for my need so far. It might have possible bad consequences later, but for now it's working as expected for what I needed. So, maybe it will not meet your exact requirements, but I am pretty sure this implementation is extendable to solve your mentioned problems.

And whoever reading this crazy or possibly bad idea, surely don't forget to put your concern with us.

like image 75
Shamil Sakib Avatar answered Sep 28 '22 02:09

Shamil Sakib