Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using window functions to LIMIT a query with SqlAlchemy on Postgres

I'm trying to write the following sql query with sqlalchemy ORM:

SELECT * FROM
   (SELECT *, row_number() OVER(w)
    FROM (select distinct on (grandma_id, author_id) * from contents) as c
    WINDOW w AS (PARTITION BY grandma_id ORDER BY RANDOM())) AS v1
WHERE row_number <= 4;

This is what I've done so far:

s = Session()

unique_users_contents = (s.query(Content).distinct(Content.grandma_id,
                                                  Content.author_id)
                         .subquery())

windowed_contents = (s.query(Content,
                             func.row_number()
                             .over(partition_by=Content.grandma_id,
                                   order_by=func.random()))
                     .select_from(unique_users_contents)).subquery()

contents = (s.query(Content).select_from(windowed_contents)
            .filter(row_number >= 4)) ##  how can I reference the row_number() value?

result = contents
for content in result:
    print "%s\t%s\t%s" % (content.id, content.grandma_id,
                          content.author_id)

As you can see it's pretty much modeled, but I have no idea how to reference the row_number() result of the subquery from the outer query where. I tried something like windowed_contents.c.row_number and adding a label() call on the window func but it's not working, couldn't find any similar example in the official docs or in stackoverflow.

How can this be accomplished? And also, could you suggest a better way to do this query?

like image 860
gonz Avatar asked Jul 02 '13 23:07

gonz


People also ask

How does the querying work with SQLAlchemy?

Python Flask and SQLAlchemy ORM All SELECT statements generated by SQLAlchemy ORM are constructed by Query object. It provides a generative interface, hence successive calls return a new Query object, a copy of the former with additional criteria and options associated with it.

What is lazy true in SQLAlchemy?

Typically when you query the database, the data get loaded at once; however, lazy parameter allows you to alternate the way they get loaded. lazy = 'select' (or True)

What does all () do in SQLAlchemy?

all() method. The Query object, when asked to return full entities, will deduplicate entries based on primary key, meaning if the same primary key value would appear in the results more than once, only one object of that primary key would be present.

Can you use SQLAlchemy with PostgreSQL?

PostgreSQL supports sequences, and SQLAlchemy uses these as the default means of creating new primary key values for integer-based primary key columns.


1 Answers

windowed_contents.c.row_number against a label() is how you'd do it, works for me (note the select_entity_from() method is new in SQLA 0.8.2 and will be needed here in 0.9 vs. select_from()):

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Content(Base):
    __tablename__ = 'contents'

    grandma_id = Column(Integer, primary_key=True)
    author_id = Column(Integer, primary_key=True)


s = Session()

unique_users_contents = s.query(Content).distinct(
                            Content.grandma_id, Content.author_id).\
                            subquery('c')

q = s.query(
        Content,
        func.row_number().over(
                partition_by=Content.grandma_id,
                order_by=func.random()).label("row_number")
    ).select_entity_from(unique_users_contents).subquery()

q = s.query(Content).select_entity_from(q).filter(q.c.row_number <= 4)

print q
like image 145
zzzeek Avatar answered Oct 25 '22 05:10

zzzeek