Here is my BigQuery
SELECT word,word_count,corpus_date FROM
[publicdata:samples.shakespeare]
WHERE word="the" ORDER BY word_count asc
which gives output as
Row word word_count corpus_date
1 the 57 1609
2 the 106 0
3 the 287 1609
4 the 353 1594
5 the 363 0
6 the 399 1592
7 the 421 1611
I want the data to be group by corpus_date.I tried using a group by corpus_date
SELECT word,word_count,corpus_date FROM
[publicdata:samples.shakespeare]
WHERE word="the" group by corpus_date
ORDER BY word_count asc
but it did'nt allow me to do a group by corpus_date.Any way to get data grouped by corpus_date
The GROUP BY clause allows you to group rows that have the same values for a given field. You can then perform aggregate functions on each of the groups. Grouping occurs after any selection or aggregation in the SELECT clause.
4) Google BigQuery SQL Syntax: GROUP BY Clause The GROUP BY clause is used only while using the SELECT statement. The GROUP BY clause comes after the WHERE clause in the query.
ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls.
You'll need to GROUP BY all non aggregated values in your query. However, since you are simply looking for a single word, you don't need to show or even GROUP BY that word in the result set (it's implicitly selected using the word="the" clause).
Therefore, if you want the total sum of word counts for the word "the" grouped by date, you can run something like this:
SELECT
SUM(word_count) as sum_for_the,
corpus_date
FROM
[publicdata:samples.shakespeare]
WHERE
word="the"
GROUP BY
corpus_date
ORDER BY
sum_for_the ASC;
That's not super useful on it's own... so if you want to do something more involved, such as learn which corpus the count per date comes from, SUM the counts of the word and list the corpora using a query like this:
SELECT
SUM(word_count) AS sum_for_the, corpus, corpus_date
FROM
[publicdata:samples.shakespeare]
WHERE
word="the"
GROUP BY
corpus_date, corpus
ORDER BY
sum_for_the ASC;
For listing all volumes that a word appeared in per year, I like to use the GROUP_CONCAT function. The word "the" appears in everything, so it's probably not as interesting as a less common word, like "swagger." (Which is one of many words invented by Shakespeare).
SELECT
SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
[publicdata:samples.shakespeare]
WHERE
word="swagger"
GROUP BY
corpus_date ORDER BY corpus_date ASC;
Even more fun is to look at word prefixes, and GROUP BY variations of a word per volume and date:
SELECT
word, SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
[publicdata:samples.shakespeare]
WHERE
word CONTAINS "swagger"
GROUP BY
word, corpus_date
ORDER BY
corpus_date ASC
IGNORE CASE;
Check out the BigQuery Query Language reference and the BigQuery Cookbook for more examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With