I am trying to join two tables that each have an array column like the following
SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array
or
SELECT a.id, b.value
FROM a INNER JOIN b
ON UNNEST(a.array) IN UNNEST(b.array)
According to this SO question, postgres has operators like <@ and >@ that compares if either is a subset of the other array ( postgres doc page) but BigQuery only allows an element of the array to be compared with the other array like the following
a.arrayelement IN UNNEST(b.array)
Can it be done in BigQuery?
edit
This is the schema I am working with
WITH b AS (
{ "ip": "192.168.1.1",
"cookie": [
{ "key": "apple",
"value: "red"
},
{ "key": "peach",
"value: "pink"
},
{ "key": "orange",
"value: "orange"
}
]
}
,{ "ip": "192.168.1.2",
"cookie": [
{ "key": "apple",
"value: "red"
},
{ "key": "orange",
"value: "orange"
}
]
}
),
WITH a AS (
{ "id": "12345",
"cookie": [
{ "key": "peach",
"value: "pink"
}
]
}
,{ "id": "67890",
"cookie": [
{ "key": "apple",
"value: "red"
},
{ "key": "orange",
"value: "orange"
},
]
}
)
I am expecting an output like the following
ip, id
192.168.1.1, 67890
192.168.1.2, 67890
192.168.1.2, 12345
It is a continuation of the following SO, How do I find elements in an array in BigQuery . I tried using subqueries to compare a single element of one of the array, but BigQuery returns an error saying that I have "too many subqueries"
To declare a specific data type for an array, use angle brackets ( < and > ). For example: SELECT ARRAY<FLOAT64>[1, 2, 3] as floats; Arrays of most data types, such as INT64 or STRING , don't require that you declare them first.
ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls. To learn more about the OVER clause and how to use it, see Window function calls.
Struct, being the Record data type, doesn't need to be unnested. Only unnested Array of Structs (Record, Repeated) will result in multiple rows with all Struct key-value pairs. You can also select few columns from Array of Structs by using unnest and selecting those particular columns with “.”
Here is an alternative solution, which avoids running JOIN in correlated subquery, and instead relies on IN UNNEST() expression - this should give better performance:
#standardSQL
WITH a AS (
SELECT 1 AS id, [2,4] AS a_arr UNION ALL
SELECT 2, [3,5]
),
b AS (
SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b
WHERE (SELECT LOGICAL_AND(a_i IN UNNEST(b.b_arr)) FROM UNNEST(a.a_arr) a_i)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With