Here is my BigQuery <pre class="prettyprint"><code>SELECT word,word_count,corpus_date FROM [publicdata:samples.shakespeare] WHERE word="the" ORDER BY word_count asc </code></pre> which gives output as <pre class="prettyprint"><code> Row word word_count corpus_date 1 the 57 1609 2 the 106 0 3 the 287 1609 4 the 353 1594 5 the 363 0 6 the 399 1592 7 the 421 1611 </code></pre> I want the data to be group by corpus_date.I tried using a group by corpus_date <pre class="prettyprint"><code> SELECT word,word_count,corpus_date FROM [publicdata:samples.shakespeare] WHERE word="the" group by corpus_date ORDER BY word_count asc </code></pre> but it did'nt allow me to do a group by corpus_date.Any way to get data grouped by corpus_date

You'll need to GROUP BY all non aggregated values in your query. However, since you are simply looking for a single word, you don't need to show or even GROUP BY that word in the result set (it's implicitly selected using the word="the" clause). Therefore, if you want the total sum of word counts for the word "the" grouped by date, you can run something like this: <pre class="prettyprint"><code>SELECT SUM(word_count) as sum_for_the, corpus_date FROM [publicdata:samples.shakespeare] WHERE word="the" GROUP BY corpus_date ORDER BY sum_for_the ASC; </code></pre> That's not super useful on it's own... so if you want to do something more involved, such as learn which corpus the count per date comes from, SUM the counts of the word and list the corpora using a query like this: <pre class="prettyprint"><code>SELECT SUM(word_count) AS sum_for_the, corpus, corpus_date FROM [publicdata:samples.shakespeare] WHERE word="the" GROUP BY corpus_date, corpus ORDER BY sum_for_the ASC; </code></pre> For listing all volumes that a word appeared in per year, I like to use the GROUP_CONCAT function. The word "the" appears in everything, so it's probably not as interesting as a less common word, like "swagger." (Which is one of many words invented by Shakespeare). <pre class="prettyprint"><code>SELECT SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date FROM [publicdata:samples.shakespeare] WHERE word="swagger" GROUP BY corpus_date ORDER BY corpus_date ASC; </code></pre> Even more fun is to look at word prefixes, and GROUP BY variations of a word per volume and date: <pre class="prettyprint"><code>SELECT word, SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date FROM [publicdata:samples.shakespeare] WHERE word CONTAINS "swagger" GROUP BY word, corpus_date ORDER BY corpus_date ASC IGNORE CASE; </code></pre> Check out the BigQuery Query Language reference and the BigQuery Cookbook for more examples.

doing a group by in google Bigquery

Tags:

google-bigquery

Here is my BigQuery

SELECT word,word_count,corpus_date FROM 
[publicdata:samples.shakespeare] 
WHERE word="the" ORDER BY word_count asc

which gives output as

    Row word    word_count corpus_date   
    1   the       57       1609  
    2   the       106      0     
    3   the       287      1609  
    4   the       353      1594  
    5   the       363      0     
    6   the       399      1592  
    7   the       421      1611

I want the data to be group by corpus_date.I tried using a group by corpus_date

    SELECT word,word_count,corpus_date FROM 
   [publicdata:samples.shakespeare] 
    WHERE word="the" group by corpus_date 
    ORDER BY word_count asc

but it did'nt allow me to do a group by corpus_date.Any way to get data grouped by corpus_date

345

asked Nov 24 '12 22:11

iJade

1 Answers

You'll need to GROUP BY all non aggregated values in your query. However, since you are simply looking for a single word, you don't need to show or even GROUP BY that word in the result set (it's implicitly selected using the word="the" clause).

Therefore, if you want the total sum of word counts for the word "the" grouped by date, you can run something like this:

SELECT
  SUM(word_count) as sum_for_the,
  corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date
ORDER BY
  sum_for_the ASC;

That's not super useful on it's own... so if you want to do something more involved, such as learn which corpus the count per date comes from, SUM the counts of the word and list the corpora using a query like this:

SELECT
  SUM(word_count) AS sum_for_the, corpus, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="the"
GROUP BY
  corpus_date, corpus
ORDER BY
  sum_for_the ASC;

For listing all volumes that a word appeared in per year, I like to use the GROUP_CONCAT function. The word "the" appears in everything, so it's probably not as interesting as a less common word, like "swagger." (Which is one of many words invented by Shakespeare).

SELECT
  SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word="swagger"
GROUP BY
  corpus_date ORDER BY corpus_date ASC;

Even more fun is to look at word prefixes, and GROUP BY variations of a word per volume and date:

SELECT
  word, SUM(word_count) AS word_sum, GROUP_CONCAT(corpus) as corpora, corpus_date
FROM
  [publicdata:samples.shakespeare]
WHERE
  word CONTAINS "swagger"
GROUP BY
  word, corpus_date
ORDER BY
  corpus_date ASC
IGNORE CASE;

Check out the BigQuery Query Language reference and the BigQuery Cookbook for more examples.

answered Oct 13 '22 06:10

Michael Manoochehri

Related questions
                            
                                Bigquery Shard Vs Bigquery Partition
                            
                                Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export
                            
                                Best way to loop through parameters in Airflow?
                            
                                Is there a metadata table to check if the table in BigQuery is partitioned?
                            
                                What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?
                            
                                Migrate csv from gcs to postgresql
                            
                                BigQuery - Transfers automation from Google Cloud Storage - Overwrite table
                            
                                Is there a way around casting large integers as string when querying data from BigQuery through R?
                            
                                Dealing with evolving schemas
                            
                                How to load compressed files into BigQuery
                            
                                How can I apply aggregate functions to data extracted from JSON in Google BigQuery?
                            
                                Add column description to BiqQuery table?
                            
                                New BigQuery pricing 'tiers'
                            
                                How bq query can get 10000 rows?
                            
                                How to use BigQuery Standard SQL in Dataflow?
                            
                                NOT IN not working in google BigQuery standard sql
                            
                                I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException
                            
                                Reverse- geocoding: How to determine the city closest to a (lat,lon) with BigQuery SQL?
                            
                                BigQuery - using SQL UDF in join predicate
                            
                                Workaround for multiple rollups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With