I have a table that looks like this:
I am looking for a table that gives a frequency count of the elements in the fields l_0, l_1, l_2, l_3
.
For example the output should look like this:
| author_id | year | l_o.name | l_0.count| l1.name | l1.count | l2.name | l2.count| l3.name | l3.count|
| 2164089123 | 1987 | biology | 3 | botany | 3 | | | | |
| 2595831531 | 1987 | computer science | 2 | simulation | 2 | computer simulation | 2 | mathematical model | 2 |
Edit:
In some cases the array field might have more than one type of element. For example l_0
could be ['biology', 'biology', 'geometry', 'geometry']
. In that case the output for fields l_0, l_1, l_2, l_3
would be a nested repeated field with all the elements in l_0.name
and all the corresponding counts in the l_0.count
.
The use of the COUNT function in Bigquery is to return a single value from the number of rows in the input. The DISTINCT clause with COUNT is used only to eliminate any duplicate row in the table.
To convert an ARRAY into a set of rows, also known as "flattening," use the UNNEST operator. UNNEST takes an ARRAY and returns a table with a single row for each element in the ARRAY . Because UNNEST destroys the order of the ARRAY elements, you may wish to restore order to the table.
BigQuery ARRAY_TO_STRING(): Converting an Array to a String You can use the ARRAY_TO_STRING() function to convert a single ARRAY<STRING> to a single STRING value.
This should work, assuming you want to count on a per-array basis:
SELECT
author_id,
year,
(SELECT AS STRUCT ANY_VALUE(l_0) AS name, COUNT(*) AS count
FROM UNNEST(l_0) AS l_0) AS l_0,
(SELECT AS STRUCT ANY_VALUE(l_1) AS name, COUNT(*) AS count
FROM UNNEST(l_1) AS l_1) AS l_1,
(SELECT AS STRUCT ANY_VALUE(l_2) AS name, COUNT(*) AS count
FROM UNNEST(l_2) AS l_2) AS l_2,
(SELECT AS STRUCT ANY_VALUE(l_3) AS name, COUNT(*) AS count
FROM UNNEST(l_3) AS l_3) AS l_3
FROM YourTable;
To avoid so much repetition, you can make use of a SQL UDF:
CREATE TEMP FUNCTION GetNameAndCount(elements ARRAY<STRING>) AS (
(SELECT AS STRUCT ANY_VALUE(elem) AS name, COUNT(*) AS count
FROM UNNEST(elements) AS elem)
);
SELECT
author_id,
year,
GetNameAndCount(l_0) AS l_0,
GetNameAndCount(l_1) AS l_1,
GetNameAndCount(l_2) AS l_2,
GetNameAndCount(l_3) AS l_3
FROM YourTable;
If you potentially need to account for multiple different names within an array, you can have the UDF return an array of them with associated counts instead:
CREATE TEMP FUNCTION GetNamesAndCounts(elements ARRAY<STRING>) AS (
ARRAY(
SELECT AS STRUCT elem AS name, COUNT(*) AS count
FROM UNNEST(elements) AS elem
GROUP BY elem
ORDER BY count
)
);
SELECT
author_id,
year,
GetNamesAndCounts(l_0) AS l_0,
GetNamesAndCounts(l_1) AS l_1,
GetNamesAndCounts(l_2) AS l_2,
GetNamesAndCounts(l_3) AS l_3
FROM YourTable;
Note that if you want to perform counting across rows, however, you'll need to unnest the arrays and perform the GROUP BY
at the outer level, but it doesn't look like this is your intention based on the question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With