I am building a query that performs some filtering on rating data.
Suppose I have a simple table called ratings
like the following, storing data from an online rating tool:
+----------------+----------------+--------+ | page_title | timestamp | rating | +----------------+----------------+--------+ | Abc | 20110417092134 | 1 | | Abc | 20110418110831 | 2 | | Def | 20110417092205 | 3 | +----------------+----------------+--------+
I need to extract pages with a high frequency of low values in the latest 10 ratings, and limit this query to pages that produced a volume of at least 20 ratings in the preceding week. This is the ridiculously long query I came up with:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM
(
SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title
AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
)
AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY a1.page_title HAVING COUNT(*) > 20
the top level query looks for pages with more than 20 ratings in the week terminating on 2011-04-24, the subquery is supposed to retrieve the number of ratings with values between [1,2] from the latest 10 ratings of each article from the top level query.
MySQL complains that a1.page_title in the WHERE clause of the subsubquery is an unknown column, I suspect this is because a1 is not defined as an alias in the second-level query, but only in the top-level query, but I am clueless how to fix this.
(edited)
I am adding as an explanation of my suspect above regarding cross-level referencing another query which works absolutely fine, note that here a1 is not defined in the subquery but it is in the immediate parent:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17' AND rating >=1
AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11'
GROUP BY a1.page_title HAVING COUNT(*) > 20
Type of Subqueries Correlated subqueries : Reference one or more columns in the outer SQL statement. The subquery is known as a correlated subquery because the subquery is related to the outer SQL statement.
You can place the Subquery in a number of SQL clauses: WHERE clause, HAVING clause, FROM clause. Subqueries can be used with SELECT, UPDATE, INSERT, DELETE statements along with expression operator. It could be equality operator or comparison operator such as =, >, =, <= and Like operator.
A subquery cannot contain a BETWEEN or LIKE clause. A subquery cannot contain an ORDER BY clause. A subquery in an UPDATE statement cannot retrieve data from the same table in which data is to be updated. A subquery in a DELETE statement cannot retrieve data from the same table in which data is to be deleted.
In a SQL database query, a correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query.
I think you might consider joining two in line views it might make things eaiser.
SELECT *
FROM (SELECT COUNT(*),
a2.page_title
FROM ratings a2
WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17'
AND rating >= 1
AND rating <= 2
GROUP BY a2.page_title) current
JOIN
(SELECT a1.page_title,
COUNT(*) AS rvol,
AVG(a1.rating) AS theavg
FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17'
AND DATE(a1.a_timestamp) >= '2011-04-11'
GROUP BY a1.page_title
HAVING COUNT(*) > 20) morethan20
ON current .page_title = morethan20.page_title
If all you have is this one simple table, I have no idea where you are pulling all of these other table names from, such as: a1, a2, ratings. I feel like either your SQL is quite a bit off, or your leaving out information.
The reason your having the error you do is because in your sub-sub-query you do not include a1 in your "FROM" statement... as so that table is not included, it cannot be referenced in your WHERE clause in that sub query.
SELECT *
FROM
(SELECT *
FROM a1
WHERE a1.timestamp <= (NOW()-604800)
AND a1.timestamp >= (NOW()-1209600)
GROUP BY a1.page_title
HAVING COUNT(a1.page_title)>20)
AS priorWeekCount
WHERE
rating <= 2
ORDER BY timestamp DESC
LIMIT 10
as I dont have a full table to test this... I THINK this is what your looking for.. but it is untested, and knowing my coding habits, very rarely is what I type 100% perfect first time ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With