Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine query that relies on resultset of another

Tags:

sql

mysql

I run this query to get 20 random items from my wordpress database based on things like rating, category, etc

 SELECT (A.user_votes/A.user_voters) as site_rating, B.ID as post_id, B.post_author, B.post_date,E.name as category
            FROM `wp_gdsr_data_article` as A 
                INNER JOIN `wp_posts` as B ON (A.post_id = B.id) 
                INNER JOIN wp_term_relationships C ON (B.ID = C.object_id) 
                INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id) 
                INNER JOIN wp_terms E ON (D.term_id = E.term_id) 
            WHERE 
                B.post_type = 'post' AND
                B.post_status = 'publish' AND
                D.taxonomy='category' AND
                E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP') 
            ORDER BY RAND()
            LIMIT 20

Then, for each result of the random items, I want to find a corresponding item that is very similar to the random item (around the same rating) but not identical and also one the user has not seen:

SELECT ABS($site_rating-(A.user_votes/A.user_voters)) as diff, (A.user_votes/A.user_voters) as site_rating, B.ID as post_id, B.post_author, B.post_date,E.name as category ,IFNULL(F.count,0) as count
            FROM `wp_gdsr_data_article` as A 
                INNER JOIN `wp_posts` as B ON (A.post_id = B.id) 
                INNER JOIN wp_term_relationships C ON (B.ID = C.object_id) 
                INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id) 
                INNER JOIN wp_terms E ON (D.term_id = E.term_id) 
                LEFT JOIN (
                    SELECT *,COUNT(*) as count FROM `verus` WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
                ) as F ON (A.post_id = F.post_id_winner OR A.post_id = F.post_id_loser)
            WHERE 
                E.name = '$category' AND
                B.ID <> '$post_id' AND
                B.post_type = 'post' AND
                B.post_status = 'publish' AND
                D.taxonomy='category' AND
                E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP') 
            ORDER BY count ASC, diff ASC
            LIMIT 1

Where the following php variables refer to the result of the previous query

    $post_id = $result['post_id'];
    $category = $result['category'];
    $site_rating = $result['site_rating'];

and $_SERVER['REMOTE_ADDR'] refers to the user's IP.

Is there a way to combine the first query with the 20 additional queries that need to be called to find corresponding items, so that I need just 1 or 2 queries?


Edit: Here is the view that simplifies the joins

CREATE VIEW `versus_random` AS
SELECT (A.user_votes/A.user_voters) as site_rating, B.ID as post_id, B.post_author, B.post_date,E.name as category 
FROM `wp_gdsr_data_article` as A 
    INNER JOIN `wp_posts` as B ON (A.post_id = B.id) 
    INNER JOIN wp_term_relationships C ON (B.ID = C.object_id) 
    INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id) 
    INNER JOIN wp_terms E ON (D.term_id = E.term_id)
WHERE 
    B.post_type = 'post' AND
    B.post_status = 'publish' AND
    D.taxonomy='category' AND
    E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP') 

My attempt now with the view:

 SELECT post_id, 
    (
            SELECT INNER_TABLE.post_id
            FROM `versus_random` as INNER_TABLE
            WHERE 
                INNER_TABLE.post_id <> OUTER_TABLE.post_id
            ORDER BY (SELECT COUNT(*) FROM `versus` WHERE ip = '54' AND (INNER_TABLE.post_id = post_id_winner OR INNER_TABLE.post_id = post_id_loser)) ASC
            LIMIT 1
        ) as innerquery

    FROM `versus_random` as OUTER_TABLE
    ORDER BY RAND()
    LIMIT 20

However the query just timesout and freezes my mysql.

like image 885
ParoX Avatar asked Nov 28 '15 18:11

ParoX


People also ask

How do you put a query result in another table?

From the Query Designer menu, point to Change Type, and then click Insert Results. In the Choose Target Table for Insert Results Dialog Box, select the table to copy rows to (the destination table).

Can you use 2 SELECT statements in SQL?

Put differently, UNION allows you to write two separate SELECT statements, and to have the results of one statement display in the same table as the results from the other statement. SQL has strict rules for appending data: Both tables must have the same number of columns.


2 Answers

I think it should work like this, but I don't have any Wordpress at hand to test it. The second query that gets the related post is embedded in the other query, when it gets just the related_post_id. The whole query is turned into a subquery itself, given the alias 'X' (although you are free to use 'G', if you want to continue your alphabet.)

In the outer query, the tables for posts and data-article are joined again (RA and RP) to query the relevant fields of the related post, based on the related_post_id from the inner query. These two tables are left joined (and in reverse order), so you still get the main post if no related post was found.

SELECT 
  X.site_rating,
  X.post_id,
  X.post_author,
  X.post_date,
  X.category,
  RA.user_votes / RA.user_voters as related_post_site_rating, 
  RP.ID as related_post_id, 
  RP.post_author as related_post_author, 
  RP.post_date as related_post_date,
  RP.name as related_category,
FROM       
  ( SELECT 
      (A.user_votes/A.user_voters) as site_rating, 
      B.ID as post_id, B.post_author, B.post_date,E.name as category,
      ( SELECT 
          RB.ID as post_id
        FROM `wp_gdsr_data_article` as RA 
            INNER JOIN `wp_posts` as RB ON (RA.post_id = RB.id) 
            INNER JOIN wp_term_relationships RC ON (RB.ID = RC.object_id) 
            INNER JOIN wp_term_taxonomy RD ON (RC.term_taxonomy_id = RD.term_taxonomy_id) 
            INNER JOIN wp_terms RE ON (RD.term_id = RE.term_id) 
            LEFT JOIN (
                SELECT *,COUNT(*) as count FROM `verus` WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
            ) as RF ON (RA.post_id = RF.post_id_winner OR RA.post_id = RF.post_id_loser)
        WHERE 
            RE.name = E.name AND
            RB.ID <> B.ID AND
            RB.post_type = 'post' AND
            RB.post_status = 'publish' AND
            RD.taxonomy='category' AND
            RE.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP') 
        ORDER BY count ASC, diff ASC
        LIMIT 1) as related_post_id

    FROM `wp_gdsr_data_article` as A 
        INNER JOIN `wp_posts` as B ON (A.post_id = B.id) 
        INNER JOIN wp_term_relationships C ON (B.ID = C.object_id) 
        INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id) 
        INNER JOIN wp_terms E ON (D.term_id = E.term_id)
    WHERE 
        B.post_type = 'post' AND
        B.post_status = 'publish' AND
        D.taxonomy='category' AND
        E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP') 
    ORDER BY RAND()
    LIMIT 20
  ) X
  LEFT JOIN `wp_posts` as RP ON RP.id = X.related_post_id 
  LEFT JOIN `wp_gdsr_data_article` as RA.post_id = RP.id
like image 73
GolezTrol Avatar answered Sep 29 '22 22:09

GolezTrol


I can't test my proposal so take it with the benefit of the doubt. Anyway i hope it could be a valid starting point for some of the issues faced.

I can not imagine a solution that does not pass through a temporary table, cabling onerous computations present in your queries. You could also have the goal to not interfere with the randomization of the first phase. In the following I try to clarify.

I'll start with these rewritings:

-- first query 
SELECT site_rating, post_id, post_author, post_date, category
    FROM POSTS_COMMON   
ORDER BY RAND()
LIMIT 20

-- second query 
SELECT ABS(R.site_rating_A - R.site_rating_B) as diff, R.site_rating_B as site_rating, P.post_id, P.post_author, P.post_date, P.category, F.count 
    FROM POSTS_COMMON AS P 
        INNER JOIN POSTS_RATING_DIFFS AS R ON (P.post_id = R.post_id_B)
        LEFT JOIN (     
            /* post_id_winner, post_id_loser explicited; COUNT(*) NULL treatment anticipated */
            SELECT post_id_winner, post_id_loser, IFNULL(COUNT(*), 0) as count FROM `verus` WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
        ) as F ON (P.post_id = F.post_id_winner OR P.post_id = F.post_id_loser)
    WHERE 
        P.category = '$category'
        AND R.post_id_A = '$post_id'
    ORDER BY count ASC, diff ASC
    LIMIT 1         

with:

SELECT A.post_id_A, B.post_id_B, A.site_rating as site_rating_A, B.site_rating as site_rating_B
INTO POSTS_RATING_DIFFS
    FROM POSTS_COMMON as A, POSTS_COMMON as B 
    WHERE A.post_id <> B.post_id AND A.category = B.category

CREATE VIEW POSTS_COMMON AS 
    SELECT  A.ID as post_id, A.user_votes, A.user_voters, (A.user_votes / A.user_voters) as site_rating, B.post_author, B.post_date, E.name as category
        FROM wp_gdsr_data_article` as A 
            INNER JOIN  `wp_posts` as B ON (A.post_id = B.post_id)
            INNER JOIN wp_term_relationships C ON (B.ID = C.object_id) 
            INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id) 
            INNER JOIN wp_terms E ON (D.term_id = E.term_id) 
        WHERE 
            B.post_type = 'post' AND
            B.post_status = 'publish' AND
            D.taxonomy='category' AND
            E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP') 

POSTS_COMMON isolates a common view between the two queries.

With POSTS_RATING_DIFFS, a temporary table populated with the ratings combinations and diffs, we have "the trick" of transforming the inequality join criteria on post_id(s) in an equality one (see R.post_id_A = '$post_id' in the second query).

We also take advantage of a temporary table in having precomputed ratings for the combinatory explosion of A.post_id <> B.post_id (with post category equality), and moreover being useful for other sessions.

Also extracting the RAND() ordering in a temporary table could be advantageous. In this case we could limit the ratings combinations and diffs only on the 20 randomly chosen.

Original limiting to one single row in the dependent second level query is done by mean of ordering and limit statements. The proposed solution avoids elaborating a LIMIT 1 on an ORDER BY resultset in the second level query wich become a subquery.

The single row calculation in the subquery is done by mean of a WHERE criteria on the maximum of a single value calculated from the columns values on which ORDER BY clause is used.

The combination into a single value must be valid in preserving the correct ordering. I'll leave in pseudo-code as:

'<combination of count and diff>'

For example, using combination of the two values into a string type, we could have:

CONCAT(LPAD(CAST(count AS CHAR), 10, '0'), LPAD(CAST(ABS(diff) AS CHAR), 20, '0'))

The structure of the single query would be:

SELECT (Q_LVL_1.user_votes/Q_LVL_1.user_voters) as site_rating_LVL_1, Q_LVL_1.post_id as post_id_LVL_1
    , Q_LVL_1.post_author as post_author_LVL_1, Q_LVL_1.post_date as post_date_LVL_1
    , Q_LVL_1.category as category_LVL_1, Q_LVL_2.post_id as post_id_LVL_2      
    , Q_LVL_2.diff as diff_LVL_2, Q_LVL_2.site_rating as site_rating_LVL_2
    , Q_LVL_2.post_author as post_author_LVL_2, Q_LVL_2.post_date as post_date_LVL_2
    , Q_LVL_2.count 

    FROM POSTS_COMMON AS Q_LVL_1
        , /* 1-row-selection query placed side by side for each Q_LVL_1's row */ 
        ( 
            SELECT CORE_P.post_id, CORE_P.ABS_diff as diff, P.site_rating, P.post_author, P.post_date, CORE_P.count 
                FROM POSTS_COMMON AS P 
                    INNER JOIN ( 
                        SELECT FIRST(CORE_P.post_id) as post_id, ABS(CORE_P.diff) as ABS_diff, CORE_P.count
                            FROM (
                                /* 
                                    selection of posts with post_id(s) different from first level query,
                                    not already taken and with the topmost value of
                                    '<combination of count and diff>'
                                */
                                ) AS CORE_P 
                            GROUP BY CORE_P.count, ABS(CORE_P.diff)
                            /* the one row selector */
                        ) AS CORE_ONE_LINER ON P.post_id = CORE_ONE_LINER.post_id                   
        ) AS Q_LVL_2

    ORDER BY RAND()
    LIMIT 20

CORE_P selection could have more post_id(s) corresponding to the topmost value '<combination of count and diff>', so the use of GROUP BY and FIRST clauses to reach the single row.

This brings to a possible final implementation:

SELECT (Q_LVL_1.user_votes/Q_LVL_1.user_voters) as site_rating_LVL_1, Q_LVL_1.post_id as post_id_LVL_1
    , Q_LVL_1.post_author as post_author_LVL_1, Q_LVL_1.post_date as post_date_LVL_1
    , Q_LVL_1.category as category_LVL_1, Q_LVL_2.post_id as post_id_LVL_2      
    , Q_LVL_2.diff as diff_LVL_2, Q_LVL_2.site_rating as site_rating_LVL_2
    , Q_LVL_2.post_author as post_author_LVL_2, Q_LVL_2.post_date as post_date_LVL_2
    , Q_LVL_2.count 

    FROM POSTS_COMMON AS Q_LVL_1
        , ( 
            SELECT CORE_P.post_id, CORE_P.ABS_diff as diff, P.site_rating, P.post_author, P.post_date, CORE_P.count 
                FROM POSTS_COMMON AS P 
                    INNER JOIN
                    (
                        SELECT FIRST(CORE_P.post_id) as post_id, ABS(CORE_P.diff) as ABS_diff, CORE_F.count

                            FROM (
                                SELECT CORE_RATING.post_id as post_id, ABS(CORE_RATING.diff) as ABS_diff, CORE_F.count
                                    FROM ( 
                                            SELECT post_id_B as post_id, site_rating_A - site_rating_B as diff
                                                FROM POSTS_RATING_DIFFS 
                                                WHERE POSTS_RATING_DIFFS.post_id_A = Q_LVL_1.post_id 
                                            ) as CORE_RATING
                                        LEFT JOIN ( 
                                            SELECT post_id_winner, post_id_loser, IFNULL(COUNT(*), 0) as count 
                                                FROM `verus` 
                                                WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
                                            ) as CORE_F ON (CORE_RATING.post_id = CORE_F.post_id_winner OR CORE_RATING.post_id = CORE_F.post_id_loser)
                                    WHERE 
                                        POSTS_RATING_DIFFS.post_id_A = Q_LVL_1.post_id                      
                                        AND '<combination of CORE_F.count and CORE_RATING.diff>'
                                            = MAX ( 
                                                SELECT '<combination of CORE_F_2.count and CORE_RATING_2.diff>'
                                                    FROM ( 
                                                        SELECT site_rating_A - site_rating_B as diff
                                                            FROM POSTS_RATING_DIFFS  
                                                            WHERE POSTS_RATING_DIFFS.post_id_A = Q_LVL_1.post_id  
                                                        ) as CORE_RATING_2
                                                    LEFT JOIN ( 
                                                        SELECT post_id_winner, post_id_loser, IFNULL(COUNT(*), 0) as count 
                                                            FROM `verus` 
                                                            WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
                                                        ) as CORE_F_2 ON (CORE_RATING_2.post_id = CORE_F_2.post_id_winner OR CORE_RATING_2.post_id = CORE_F_2.post_id_loser)    
                                                ) /* END MAX */
                                    ) AS CORE_P 
                                    GROUP BY CORE_P.count, ABS(CORE_P.diff)
                            ) AS CORE_ONE_LINER ON P.post_id = CORE_ONE_LINER.post_id

                ) AS Q_LVL_2

    ORDER BY RAND()
    LIMIT 20
like image 30
rfb Avatar answered Sep 29 '22 23:09

rfb