Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doing a group by in google Bigquery

Here is my BigQuery

SELECT word,word_count,corpus_date FROM 
[publicdata:samples.shakespeare] 
WHERE word="the" ORDER BY word_count asc

which gives output as

    Row word    word_count corpus_date   
    1   the       57       1609  
    2   the       106      0     
    3   the       287      1609  
    4   the       353      1594  
    5   the       363      0     
    6   the       399      1592  
    7   the       421      1611  

I want the data to be group by corpus_date.I tried using a group by corpus_date

    SELECT word,word_count,corpus_date FROM 
   [publicdata:samples.shakespeare] 
    WHERE word="the" group by corpus_date 
    ORDER BY word_count asc

but it did'nt allow me to do a group by corpus_date.Any way to get data grouped by corpus_date

like image 345
iJade Avatar asked Nov 24 '12 22:11

iJade


People also ask

How does GROUP BY work in BigQuery?

The GROUP BY clause allows you to group rows that have the same values for a given field. You can then perform aggregate functions on each of the groups. Grouping occurs after any selection or aggregation in the SELECT clause.

Which SQL clause in BigQuery requires the GROUP BY clause in the SQL statement?

4) Google BigQuery SQL Syntax: GROUP BY Clause The GROUP BY clause is used only while using the SELECT statement. The GROUP BY clause comes after the WHERE clause in the query.

What does Array_agg do in BigQuery?

ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls.


1 Answers

You'll need to GROUP BY all non aggregated values in your query. However, since you are simply looking for a single word, you don't need to show or even GROUP BY that word in the result set (it's implicitly selected using the word="the" clause).

Therefore, if you want the total sum of word counts for the word "the" grouped by date, you can run something like this:

SELECT
  SUM(word_count) as sum_for_the,
  corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date
ORDER BY
  sum_for_the ASC;

That's not super useful on it's own... so if you want to do something more involved, such as learn which corpus the count per date comes from, SUM the counts of the word and list the corpora using a query like this:

SELECT
  SUM(word_count) AS sum_for_the, corpus, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date, corpus
ORDER BY
  sum_for_the ASC;

For listing all volumes that a word appeared in per year, I like to use the GROUP_CONCAT function. The word "the" appears in everything, so it's probably not as interesting as a less common word, like "swagger." (Which is one of many words invented by Shakespeare).

SELECT
  SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="swagger"
GROUP BY
  corpus_date ORDER BY corpus_date ASC;

Even more fun is to look at word prefixes, and GROUP BY variations of a word per volume and date:

SELECT
  word, SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word CONTAINS "swagger"
GROUP BY
  word, corpus_date
ORDER BY
  corpus_date ASC
IGNORE CASE;

Check out the BigQuery Query Language reference and the BigQuery Cookbook for more examples.

like image 82
Michael Manoochehri Avatar answered Oct 13 '22 06:10

Michael Manoochehri