Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy requires query to be aliased yet that alias is not used in the generated SQL

I have a simple model class that represents a battle between two characters:

class WaifuPickBattle(db.Model):
    """Table which represents a where one girl is chosen as a waifu."""

    __tablename__ = "waifu_battles"
    id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey("users.id"), nullable=False)
    date = db.Column(db.DateTime, nullable=False)
    winner_name = db.Column(db.String, nullable=False)
    loser_name = db.Column(db.String, nullable=False)

I have a method which constructs a CTE which projects the battles into a series of appearences (each battle has two appearences - the winner and the loser):

def get_battle_appearences_cte():
    """Create a sqlalchemy subquery of the battle appearences."""
    wins = select([
        WaifuPickBattle.date,
        WaifuPickBattle.winner_name.label("name"),
        expression.literal_column("1").label("was_winner"),
        expression.literal_column("0").label("was_loser")
    ])
    losses = select([
        WaifuPickBattle.date,
        WaifuPickBattle.loser_name.label("name"),
        expression.literal_column("0").label("was_winner"),
        expression.literal_column("1").label("was_loser")
    ])
    return wins.union_all(losses).cte("battle_appearence")

I then have a query which utilises this view to determine the characters which have seen the most battles:

def query_most_battled_waifus():
    """Find the waifus with the most battles in a given date range."""
    appearence_cte = get_battle_appearences_cte()
    query = \
        select([
            appearence_cte.c.name,
            func.sum(appearence_cte.c.was_winner).label("wins"),
            func.sum(appearence_cte.c.was_loser).label("losses"),
        ])\
        .group_by(appearence_cte.c.name)\
        .order_by(func.count().desc())\
        .limit(limit)
    return db.session.query(query).all()

This generates the following SQL:

WITH battle_appearence  AS
(
    SELECT
        waifu_battles.date AS date,
        waifu_battles.winner_name AS name,
        1 AS was_winner,
        0 AS was_loser
    FROM waifu_battles
    UNION ALL
    SELECT
        waifu_battles.date AS date,
        waifu_battles.loser_name AS name,
        0 AS was_winner,
        1 AS was_loser
    FROM waifu_battles
)
SELECT
    name AS name,
    wins AS wins,
    losses AS losses
FROM
(
    SELECT
        battle_appearence.name AS name,
        sum(battle_appearence.was_winner) AS wins,
        sum(battle_appearence.was_winner) AS losses
    FROM battle_appearence
    GROUP BY battle_appearence.name
    ORDER BY count(*) DESC
)

This works perfectly fine when executing against a SQLite database, but when executing against a Postgres SQL database the following error is given:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) subquery in FROM must have an alias

LINE 6: FROM (SELECT battle_appearence.name AS name, count(battle_ap... ^ HINT: For example, FROM (SELECT ...) [AS] foo.

[SQL: WITH battle_appearence AS (SELECT waifu_battles.date AS date, waifu_battles.winner_name AS name, 1 AS was_winner, 0 AS was_loser FROM waifu_battles UNION ALL SELECT waifu_battles.date AS date, waifu_battles.loser_name AS name, 0 AS was_winner, 1 AS was_loser FROM waifu_battles) SELECT name AS name, wins AS wins, losses AS losses FROM (SELECT battle_appearence.name AS name, count(battle_appearence.was_winner) AS wins, count(battle_appearence.was_winner) AS losses FROM battle_appearence GROUP BY battle_appearence.name ORDER BY count(*) DESC)] (Background on this error at: http://sqlalche.me/e/f405)

There are a few things to notice at this point:

  1. The sub-select is redundant, we should simply using the sub-select as the main select statement.
  2. You could resolve this by aliasing the sub-select and using <alias>.<column> in the main select statement - Postgres requiring an alias on the sub-selects is well documented elsewhere.

My first question is how would I alias this sub-select seeing that SQLalchemy decides to introduce it despite not being explicitly instructed to (as far as I can tell)?

I found a solution to the problem was to add .alias("foo") to the query:

query = query\
        ...\
        .alias("foo")

Which casuses the following SQL to be generated (one that weirdly resolved the whole redundant sub-select issue as well!):

WITH battle_appearence  AS
(
    SELECT
        waifu_battles.date AS date,
        waifu_battles.winner_name AS name,
        1 AS was_winner,
        0 AS was_loser
    FROM waifu_battles
    UNION ALL
    SELECT
        waifu_battles.date AS date,
        waifu_battles.loser_name AS name,
        0 AS was_winner,
        1 AS was_loser
    FROM waifu_battles
)
SELECT
    battle_appearence.name,
    sum(battle_appearence.was_winner) AS wins,
    sum(battle_appearence.was_winner) AS losses
FROM battle_appearence
GROUP BY battle_appearence.name
ORDER BY count(*) DESC

My second question is why did adding the alias prevent the sub-select from being created and why is the alias not used! The "foo" alias was seemingly disregarded yet had a substantial effect on the generated query.

like image 970
HennyH Avatar asked May 08 '19 11:05

HennyH


People also ask

How to use alias () function in SQLAlchemy?

The alias () function in sqlalchemy.sql module represents an alias, as typically applied to any table or sub-select within a SQL statement using the AS keyword. from sqlalchemy.sql import alias st = students.alias("a") This alias can now be used in select () construct to refer to students table − s = select([st]).where(st.c.id>2)

What is alias () in SQL Server?

aliased () or alias () are used whenever you need to use the SELECT ... FROM my_table my_table_alias ... construct in SQL, mostly when using the same table more than once in a query (self-joins, with or without extra tables).

Can we use alias for columns as well as tables?

We can use the aliases for columns as well as for tables. We will study in a described way each of the cases further. How does SQL Alias work? let us study how does alias work: 1. Aliases to the columns

What should we have to be careful while using aliases?

We have to careful while using aliases only about the name that we use for specifying the alias. The name of the alias should be such that it is meaningful and relevant to the object to which it is being assigned.


1 Answers

Answers

SQLalchemy decides to introduce it despite not being explicitly instructed to

It does not. You are telling it to use the sub-query the very moment you call db.sesion.query(query) (though you may not be aware of it). Use db.session.execute(query) instead.

why did adding the alias prevent the sub-select from being created and why is the alias not used! The "foo" alias was seemingly disregarded yet had a substantial effect on the generated query.

It did not and it is used.

Explanation - introduction

SQLAlchemy just tricked you. I presume that you have been using print(query) to peek under the hood and understand what is wrong - bad luck this time, it did not tell you the entire truth.

To view the real SQL which was generated, turn the echo functionality on in the engine. Once done, you will discover that in reality, the sqlalchemy has generated the following query:

WITH battle_appearence AS 
(
    SELECT
        waifu_battles.date AS date,
        waifu_battles.winner_name AS name,
        1 AS was_winner,
        0 AS was_loser 
    FROM waifu_battles
    UNION ALL
    SELECT
        waifu_battles.date AS date,
        waifu_battles.loser_name AS name,
        0 AS was_winner,
        1 AS was_loser 
    FROM waifu_battles
)
SELECT foo.name AS foo_name, foo.wins AS foo_wins, foo.losses AS foo_losses 
FROM (
    SELECT
        battle_appearence.name AS name,
        sum(battle_appearence.was_winner) AS wins,
        sum(battle_appearence.was_loser) AS losses 
    FROM battle_appearence
    GROUP BY battle_appearence.name
    ORDER BY count(*) DESC
    LIMIT ?
)
AS foo

Both queries work correctly (the one that I claim to be really used - above - and the query you gave at the end of your answer). Let's dive into this first - why are these different?

How to debug the queries and why what you saw was different

The query that you saw (let's call it S as select over alias) is the string representation of the query or the result of str(query.compile()). You can adjust it to use the postgres dialect:

dialect = postgresql.dialect()
str(query.compile(dialect=dialect))

and get a slightly different result but still with no subquery. Intriguing, isn't it? Just for future reference, the query.compile is (in simplification) the same as calling dialect.statement_compiler(dialect, query, bind=None)

The second query (let's call it A as aliased) is generated when calling db.session.query(query).all(). If you just type str(db.session.query(query)), you will see that we get a different query (compared to N's query.compile()) - with a subsquery and with an alias.

Does it have anything to do with the session? No - you can check that by converting the query to Query object, disregarding the session information:

from sqlalchemy.orm.query import Query
str(Query(query))

Peeking into the implementation details (Query.__str__) we can see that what is going on for A is:

context = Query(query)._compile_context()
str(context.statement.compile(bind=None))

the context.statement.compile will try to choose a dialect (in our case correctly identifying Postgres) and then execute the statement in the same way as it was done for the S variant:

dialect.statement_compiler(dialect, context.statement, bind=None)

To remind ourselves, the S originates from:

dialect = postgresql.dialect()
str(dialect.statement_compiler(dialect, query, bind=None))

This hints us that in the context there is something which changes the behaviour of the statement compiler. What does the dialect.statement_compiler do,? It is a constructor of a subclass of SQLCompiler, specialised in the inheritance process to match your dialect needs; for Postgres it ought to be PGCompiler.

NB: We can take a shortcut for the A:

dialect.statement_compiler(dialect, Query(query).statement, bind=None)

Let's compare the state of the compiled objects. This can be easily done by accessing the __dict__ attribute of the compilers:

with_subquery = dialect.statement_compiler(dialect, context.statement, bind=None)
no_subquery = dialect.statement_compiler(dialect, query, bind=None)
from deepdiff import DeepDiff 
DeepDiff(sub.__dict__, nosub.__dict__, ignore_order=True)

Importantly, the types of statements have changed. This is not unexpected as in the first instance, context.statement is a sqlalchemy.sql.selectable.Select object, while in the latter query is sqlalchemy.sql.selectable.Alias object.

This highlights the fact that conversion of a query to a Query object with db.session.query(), causes the compiler to take a different route based on the changed type of the statement. We can see that S is, in fact, an alias wrapped in a select using:

>>> context.statement._froms
[<sqlalchemy.sql.selectable.Alias at 0x7f7e2f4f7160; foo>]

The fact that the alias is rendered when wrapped in a select statement (S), creating a subquery is somehow consistent with the documentation which describes Alias as having use in SELECT statement (but not as the root of the query):

When an Alias is created from a Table object, this has the effect of the table being rendered as tablename AS aliasname in a SELECT statement.

Why was there a sub-select in the first place?

Let's name the query without .alias('foo') as N (no alias) and represent it in the pseudocode below as n_query. Because it is of type sqlalchemy.sql.selectable.Select when you called the db.session.query(n_query) it created a sub-query in much the same way as in the case with the alias. You can verify that we got a select inside of another select with:

>>> Query(nquery).statement._froms
[<sqlalchemy.sql.selectable.Select at 0x7f7e1e26e668; Select object>]

You should now easily see that having a select within a select means that a the sub-select has always been created when querying the database with db.session.query(n_query).

I am not sure why the first query that you show has a sub-query visible - would it be possible that you have used echo (or str(db.session(n_query)) back then?

Can I change this behaviour?

Sure! Just execute your query with:

db.session.execute(n_query)

and then (if you enabled echo as instructed above) you will see the same query (as you have posted at the very end) being emitted.

This is exactly the same as executing the aliased query:

db.session.execute(n_query.alias('foo'))

because the alias has no use if there is no consecutive select!

like image 139
krassowski Avatar answered Oct 20 '22 08:10

krassowski