Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to do subqueries in bigquery?

Im trying to play with the reddit data on bigquery and I want to see comments and replies in one row. I see bigquery supports subqueries, but I am unable to construct the query. I have to use a subquery to self join the same table because of the structure of the data, specifically i want to join id and parent_id together, but I need to modify id before I can join. Here is how im trying to do the query:

SELECT 
  p.subreddit, 
  p.body AS first_body,
  p.score AS first_score,
  CONCAT('t1_',p.id) AS first_id ,
  c.last_body,
  c.last_score,
  c.last_id 
FROM 
[fh-bigquery:reddit_comments.2016_01] p,
(
  SELECT 
    body AS last_body,
    score AS last_score,
    CONCAT('t1_',id) AS last_id,
    parent_id,
    author,
    body 
  FROM  [fh-bigquery:reddit_comments.2016_01] 
  WHERE body != '[deleted]' 
  AND author != '[deleted]' 
  AND score > 1
)  c
WHERE  p.first_id = c.parent_id  
AND p.score > 1 
AND  p.author != '[deleted]' 
AND p.body != '[deleted]';

The error I get is:

Field 'c.parent_id' not found in table 'fh-bigquery:reddit_comments.2016_01'; did you mean 'parent_id'?

Here is where you can run the query: https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.2016_01

Im not sure how to fix this. what is the proper way to join this and get this query to run?

like image 889
jas Avatar asked Nov 07 '25 04:11

jas


1 Answers

You might want to do something like below (just guess):

SELECT 
  p.subreddit, 
  p.body AS first_body,
  p.score AS first_score,
  CONCAT('t1_',p.id) AS first_id ,
  c.last_body,
  c.last_score,
  c.last_id 
FROM 
[fh-bigquery:reddit_comments.2016_01] p
JOIN (
  SELECT 
    body AS last_body,
    score AS last_score,
    CONCAT('t1_',id) AS last_id,
    parent_id,
    author,
    body 
  FROM  [fh-bigquery:reddit_comments.2016_01] 
  WHERE body != '[deleted]' 
  AND author != '[deleted]' 
  AND score > 1
)  c
ON  p.link_id = c.parent_id  
WHERE p.score > 1 
AND  p.author != '[deleted]' 
AND p.body != '[deleted]'
LIMIT 100

See more about JOINs

Please note, I just transformed your query to proper use JOINs, but logic of query is still for you to polish as you see needed

Added to address additional info in your comment:

SELECT 
  subreddit, 
  first_body,
  first_score,
  first_id ,
  last_body,
  last_score,
  last_id 
FROM (
  SELECT 
    subreddit, 
    body AS first_body,
    score AS first_score,
    CONCAT('t1_',id) AS first_id 
  FROM [fh-bigquery:reddit_comments.2016_01]
  WHERE score > 1 
  AND author != '[deleted]' 
  AND body != '[deleted]'
) p
JOIN (
  SELECT 
    body AS last_body,
    score AS last_score,
    CONCAT('t1_',id) AS last_id,
    parent_id,
    author,
    body 
  FROM  [fh-bigquery:reddit_comments.2016_01] 
  WHERE body != '[deleted]' 
  AND author != '[deleted]' 
  AND score > 1
)  c
ON  p.first_id = c.parent_id  
LIMIT 100  
like image 65
Mikhail Berlyant Avatar answered Nov 09 '25 19:11

Mikhail Berlyant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!