Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct way to use distinct on (Postgres) with SqlAlchemy?

I want to get all the columns of a table with max(timestamp) and group by name.

What i have tried so far is: normal_query ="Select max(timestamp) as time from table"

event_list = normal_query \
            .distinct(Table.name)\
            .filter_by(**filter_by_query) \
            .filter(*queries) \
            .group_by(*group_by_fields) \
            .order_by('').all()

the query i get :

SELECT  DISTINCT ON (schema.table.name) , max(timestamp)....

this query basically returns two columns with name and timestamp.

whereas, the query i want :

SELECT DISTINCT ON (schema.table.name) * from table order by ....

which returns all the columns in that table.Which is the expected behavior and i am able to get all the columns, how could i right it down in python to get to this statement?.Basically the asterisk is missing. Can somebody help me?

like image 649
Shubham Avatar asked Jul 29 '19 11:07

Shubham


People also ask

How does distinct work in PostgreSQL?

The DISTINCT clause is used in the SELECT statement to remove duplicate rows from a result set. The DISTINCT clause keeps one row for each group of duplicates. The DISTINCT clause can be applied to one or more columns in the select list of the SELECT statement.

Can we use distinct in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.

Can I use SQLAlchemy with PostgreSQL?

This SQLAlchemy engine is a global object which can be created and configured once and use the same engine object multiple times for different operations. The first step in establishing a connection with the PostgreSQL database is creating an engine object using the create_engine() function of SQLAlchemy.

Does SQLAlchemy require primary key?

¶ The SQLAlchemy ORM, in order to map to a particular table, needs there to be at least one column denoted as a primary key column; multiple-column, i.e. composite, primary keys are of course entirely feasible as well.


1 Answers

What you seem to be after is the DISTINCT ON ... ORDER BY idiom in Postgresql for selecting greatest-n-per-group results (N = 1). So instead of grouping and aggregating just

event_list = Table.query.\
    distinct(Table.name).\
    filter_by(**filter_by_query).\
    filter(*queries).\
    order_by(Table.name, Table.timestamp.desc()).\
    all()

This will end up selecting rows "grouped" by name, having the greatest timestamp value.

You do not want to use the asterisk most of the time, not in your application code anyway, unless you're doing manual ad-hoc queries. The asterisk is basically "all columns from the FROM table/relation", which might then break your assumptions later, if you add columns, reorder them, and such.

In case you'd like to order the resulting rows based on timestamp in the final result, you can use for example Query.from_self() to turn the query to a subquery, and order in the enclosing query:

event_list = Table.query.\
    distinct(Table.name).\
    filter_by(**filter_by_query).\
    filter(*queries).\
    order_by(Table.name, Table.timestamp.desc()).\
    from_self().\
    order_by(Table.timestamp.desc()).\
    all()
like image 144
Ilja Everilä Avatar answered Nov 06 '22 07:11

Ilja Everilä