Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL subquery to refer to field in parent query

Tags:

mysql

subquery

I am building a query that performs some filtering on rating data.

Suppose I have a simple table called ratings like the following, storing data from an online rating tool:

+----------------+----------------+--------+
| page_title     | timestamp      | rating |
+----------------+----------------+--------+
| Abc            | 20110417092134 | 1      |
| Abc            | 20110418110831 | 2      |
| Def            | 20110417092205 | 3      |
+----------------+----------------+--------+

I need to extract pages with a high frequency of low values in the latest 10 ratings, and limit this query to pages that produced a volume of at least 20 ratings in the preceding week. This is the ridiculously long query I came up with:

SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg, 
(
     SELECT COUNT(*) FROM
     (
         SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title 
         AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
     ) 
     AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17" 
GROUP BY a1.page_title HAVING COUNT(*) > 20

the top level query looks for pages with more than 20 ratings in the week terminating on 2011-04-24, the subquery is supposed to retrieve the number of ratings with values between [1,2] from the latest 10 ratings of each article from the top level query.

MySQL complains that a1.page_title in the WHERE clause of the subsubquery is an unknown column, I suspect this is because a1 is not defined as an alias in the second-level query, but only in the top-level query, but I am clueless how to fix this.

(edited)

I am adding as an explanation of my suspect above regarding cross-level referencing another query which works absolutely fine, note that here a1 is not defined in the subquery but it is in the immediate parent:

SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg, 
(
    SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
    AND DATE(timestamp) >= '2011-04-17' AND rating >=1 
    AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1 
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11' 
GROUP BY a1.page_title HAVING COUNT(*) > 20
like image 469
radrat Avatar asked Apr 25 '11 17:04

radrat


People also ask

What type of subquery contains a reference to a column in the outer query?

Type of Subqueries Correlated subqueries : Reference one or more columns in the outer SQL statement. The subquery is known as a correlated subquery because the subquery is related to the outer SQL statement.

Can you use subquery in a WHERE clause of the SQL query?

You can place the Subquery in a number of SQL clauses: WHERE clause, HAVING clause, FROM clause. Subqueries can be used with SELECT, UPDATE, INSERT, DELETE statements along with expression operator. It could be equality operator or comparison operator such as =, >, =, <= and Like operator.

Can subquery retrieve data from inner query used in WHERE clause?

A subquery cannot contain a BETWEEN or LIKE clause. A subquery cannot contain an ORDER BY clause. A subquery in an UPDATE statement cannot retrieve data from the same table in which data is to be updated. A subquery in a DELETE statement cannot retrieve data from the same table in which data is to be deleted.

Which subquery use value from outer query?

In a SQL database query, a correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query.


2 Answers

I think you might consider joining two in line views it might make things eaiser.

SELECT * 
FROM   (SELECT COUNT(*), 
               a2.page_title 
        FROM   ratings a2 
        WHERE  DATE(timestamp) <= '2011-04-24' 
               AND DATE(timestamp) >= '2011-04-17' 
               AND rating >= 1 
               AND rating <= 2 

        GROUP  BY a2.page_title) current 
       JOIN 
        (SELECT a1.page_title, 
                    COUNT(*)       AS rvol, 
                    AVG(a1.rating) AS theavg 
             FROM   ratings a1 
             WHERE  DATE(a1.timestamp) <= '2011-04-17' 
                    AND DATE(a1.a_timestamp) >= '2011-04-11' 
             GROUP  BY a1.page_title 
             HAVING COUNT(*) > 20) morethan20 
         ON current .page_title = morethan20.page_title 
like image 103
Conrad Frix Avatar answered Oct 06 '22 22:10

Conrad Frix


If all you have is this one simple table, I have no idea where you are pulling all of these other table names from, such as: a1, a2, ratings. I feel like either your SQL is quite a bit off, or your leaving out information.

The reason your having the error you do is because in your sub-sub-query you do not include a1 in your "FROM" statement... as so that table is not included, it cannot be referenced in your WHERE clause in that sub query.

SELECT * 
FROM
    (SELECT *
        FROM a1
        WHERE a1.timestamp <= (NOW()-604800)
            AND a1.timestamp >= (NOW()-1209600)
        GROUP BY a1.page_title
        HAVING COUNT(a1.page_title)>20)
    AS priorWeekCount
WHERE
    rating <= 2
ORDER BY timestamp DESC
LIMIT 10

as I dont have a full table to test this... I THINK this is what your looking for.. but it is untested, and knowing my coding habits, very rarely is what I type 100% perfect first time ;)

like image 35
CenterOrbit Avatar answered Oct 06 '22 23:10

CenterOrbit