Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Pivot in Google BigQuery [duplicate]

Suppose I have the following query sent to BQ:

SELECT shipmentID, category, quantity
FROM [myDataset.myTable]

Further, suppose that the query returns data such as:

shipmentID  category  quantity
1           shoes     5
1           hats      3
2           shirts    1
2           hats      2
3           toys      3
2           books     1
3           shirts    1

How can I pivot the results, from within BQ, to produce output as follows:

 shipmentID   shoes  hats  shirts  toys  books
 1            5      3     0       0     0
 2            0      2     1       0     1
 3            0      0     1       3     0

As some additional background, I actually have 2000+ categories that I need to pivot, and the quantity of data is such that I can't do it directly through a Pandas DataFrame in Python (uses all the memory, then slows to a crawl). I tried using a relational database, but ran into a column limit, so I'd like to be able to do it directly in BQ, even if I have to build the query itself through python. Any suggestions?

** Edit 1 I should mention that pivoting the data itself can be done in chunks and is therefore not the issue. The real trouble comes in trying to do the aggregation afterwards, so that I have only one row for each shipmentID. That's what eats all the RAM.

** Edit 2 After trying out the accepted answer below, I found that trying to use it to create a 2k+ column pivot table was causing "Resources exceeded" errors. My BQ team was able to refactor the query to break it into smaller chunks and allow it to go through. The basic structure of the query is as follows:

SELECT
  SetA.*,
  SetB.*,
  SetC.*
FROM (
  SELECT
    shipmentID,
    SUM(IF (category="Rocks", qty, 0)),
    SUM(IF (category="Paper", qty, 0)),
    SUM(IF (category="Scissors", qty, 0))
  FROM (
    SELECT
      a.shipmentid shipmentid,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetA
INNER JOIN EACH (
  SELECT
    shipmentID,
    SUM(IF (category="Jello Molds", quantity, 0)),
    SUM(IF (category="Torque Wrenches", quantity, 0))
  FROM (
    SELECT
      a.shipmentID shipmentID,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetB
ON
  SetA.shipmentid = SetB.shipmentid
INNER JOIN EACH (
  SELECT
    shipmentID,
    SUM(IF (category="Deep Thoughts", qty, 0)),
    SUM(IF (category="Rainbows", qty, 0)),
    SUM(IF (category="Ponies", qty, 0))
  FROM (
    SELECT
      a.shipmentid shipmentid,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetC
ON
  SetB.shipmentID = SetC.shipmentID

The above pattern can be continued indefinitely by adding INNER JOIN EACH segments one after the other. For my application, BQ was able to handle about 500 columns per chunk.

like image 514
TraxusIV Avatar asked Jun 08 '15 16:06

TraxusIV


1 Answers

This is a way to do:

select shipmentID,
  sum(IF (category='shoes', quantity, 0)) AS shoes,
  sum(IF (category='hats', quantity, 0)) AS hats,
  sum(IF (category='shirts', quantity, 0)) AS shirts,
  sum(IF (category='toys', quantity, 0)) AS toys,
  sum(IF (category='books', quantity, 0)) AS books,
from
  (select 1 as shipmentID,           'shoes' as category,    5 as quantity),
  (select 1 as shipmentID,           'hats' as category,      3 as quantity),
  (select 2 as shipmentID,           'shirts' as category,    1 as quantity),
  (select 2 as shipmentID,           'hats' as category,      2 as quantity),
  (select 3 as shipmentID,           'toys' as category,      3 as quantity),
  (select 2 as shipmentID,           'books' as category,     1 as quantity),
  (select 3 as shipmentID,           'shirts' as category,    1 as quantity),
group by shipmentID

This returns:

+-----+------------+-------+------+--------+------+-------+---+
| Row | shipmentID | shoes | hats | shirts | toys | books |   |
+-----+------------+-------+------+--------+------+-------+---+
|   1 |          1 |     5 |    3 |      0 |    0 |     0 |   |
|   2 |          2 |     0 |    2 |      1 |    0 |     1 |   |
|   3 |          3 |     0 |    0 |      1 |    3 |     0 |   |
+-----+------------+-------+------+--------+------+-------+---+

See the manual for other pivot table example.

like image 198
Pentium10 Avatar answered Oct 21 '22 10:10

Pentium10