Unfortunately reshaping in BQ it's not as easy as in R and I can't export my data for this project.
Here is input
date country A B C D
20170928 CH 3000.3 121 13 3200
20170929 CH 2800.31 137 23 1614.31
Expected output
date country Metric Value
20170928 CH A 3000.3
20170928 CH B 121
20170928 CH C 13
20170928 CH D 3200
20170929 CH A 2800.31
20170929 CH B 137
20170929 CH C 23
20170929 CH D 1614.31
Also my table has many more columns and rows (but I assume a lot of manual will be required)
To flatten an entire column of ARRAY s while preserving the values of the other columns in each row, use a correlated cross join to join the table containing the ARRAY column to the UNNEST output of that ARRAY column.
OFFSET means that the numbering starts at zero, ORDINAL means that the numbering starts at one. A given array can be interpreted as either 0-based or 1-based. When accessing an array element, you must preface the array position with OFFSET or ORDINAL , respectively; there is no default behavior.
Below is for BigQuery Standard SQL and does not require repeating selects depends on number of columns. It will pick as many as you have and transform them into metrics and values
#standardSQL
SELECT DATE, country,
metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT DATE, country,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.yourtable` t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('date', 'country')
You can test / play with above using dummy data as in your question
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT '20170928' DATE, 'CH' country, 3000.3 A, 121 B, 13 C, 3200 D UNION ALL
SELECT '20170929', 'CH', 2800.31, 137, 23, 1614.31
)
SELECT DATE, country,
metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT DATE, country,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.yourtable` t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('date', 'country')
result is as expected
DATE country metric value
20170928 CH A 3000.3
20170928 CH B 121.0
20170928 CH C 13.0
20170928 CH D 3200.0
20170929 CH A 2800.31
20170929 CH B 137.0
20170929 CH C 23.0
20170929 CH D 1614.31
You need UNION
which is denoted using commas in bigquery
SELECT date, country, Metric, Value
FROM (
SELECT date, country, 'A' as Metric, A as Value FROM your_table
), (
SELECT date, country, 'B' as Metric, B as Value FROM your_table
), (
SELECT date, country, 'C' as Metric, C as Value FROM your_table
) , (
SELECT date, country, 'D' as Metric, D as Value FROM your_table
)
Most answers that I managed to find required specifying the name of EVERY column to be melt. This is not tractable when I have hundreds to thousands of columns in the table. Here is an answer that works for an arbitrarily wide table.
It utilizes dynamic SQL and automatically extract multiple column names from the data schema, collate a command string and then evaluate that string. This is intended to mimic Python pandas.melt() / R reshape2::melt() behavior.
I intentionally did not create user defined functions because of some undesirable properties of UDFs. Depending on how you use this, you may or may not want to do that.
Input:
id0 id1 _2020_05_27 _2020_05_28
1 1 11 12
1 2 13 14
2 1 15 16
2 2 17 18
Output:
id0 id1 date value
1 2 _2020_05_27 13
1 2 _2020_05_28 14
2 2 _2020_05_27 17
2 2 _2020_05_28 18
1 1 _2020_05_27 11
1 1 _2020_05_28 12
2 1 _2020_05_27 15
2 1 _2020_05_28 16
#standardSQL
-- PANDAS MELT FUNCTION IN GOOGLE BIGQUERY
-- author: Luna Huang
-- email: [email protected]
-- run this script with Google BigQuery Web UI in the Cloud Console
-- this piece of code functions like the pandas melt function
-- pandas.melt(id_vars, value_vars, var_name, value_name, col_level=None)
-- without utilizing user defined functions (UDFs)
-- see below for where to input corresponding arguments
DECLARE cmd STRING;
DECLARE subcmd STRING;
SET cmd = ("""
WITH original AS (
-- query to retrieve the original table
%s
),
nested AS (
SELECT
[
-- sub command to be automatically generated
%s
] as s,
-- equivalent to id_vars in pandas.melt()
%s,
FROM original
)
SELECT
-- equivalent to id_vars in pandas.melt()
%s,
-- equivalent to var_name in pandas.melt()
s.key AS %s,
-- equivalent to value_name in pandas.melt()
s.value AS %s,
FROM nested
CROSS JOIN UNNEST(nested.s) AS s
""");
SET subcmd = ("""
WITH
columns AS (
-- query to retrieve the column names
-- equivalent to value_vars in pandas.melt()
-- the resulting table should have only one column
-- with the name: column_name
%s
),
scs AS (
SELECT FORMAT("STRUCT('%%s' as key, %%s as value)", column_name, column_name) AS sc
FROM columns
)
SELECT ARRAY_TO_STRING(ARRAY (SELECT sc FROM scs), ",\\n")
""");
-- -- -- EXAMPLE BELOW -- -- --
-- SET UP AN EXAMPLE TABLE --
CREATE OR REPLACE TABLE `tmp.example`
(
id0 INT64,
id1 INT64,
_2020_05_27 INT64,
_2020_05_28 INT64,
);
INSERT INTO `tmp.example` VALUES (1, 1, 11, 12);
INSERT INTO `tmp.example` VALUES (1, 2, 13, 14);
INSERT INTO `tmp.example` VALUES (2, 1, 15, 16);
INSERT INTO `tmp.example` VALUES (2, 2, 17, 18);
-- MELTING STARTS --
-- execute these two command to melt the table
-- the first generates the STRUCT commands
-- and saves a string in subcmd
EXECUTE IMMEDIATE FORMAT(
-- please do not change this argument
subcmd,
-- query to retrieve the column names
-- equivalent to value_vars in pandas.melt()
-- the resulting table should have only one column
-- with the name: column_name
"""
SELECT column_name
FROM `tmp.INFORMATION_SCHEMA.COLUMNS`
WHERE (table_name = "example") AND (column_name NOT IN ("id0", "id1"))
"""
) INTO subcmd;
-- the second implements the melting
EXECUTE IMMEDIATE FORMAT(
-- please do not change this argument
cmd,
-- query to retrieve the original table
"""
SELECT *
FROM `tmp.example`
""",
-- please do not change this argument
subcmd,
-- equivalent to id_vars in pandas.melt()
-- !!please type these twice!!
"id0, id1", "id0, id1",
-- equivalent to var_name in pandas.melt()
"date",
-- equivalent to value_name in pandas.melt()
"value"
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With