Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you compare two arrays in BigQuery?

I am trying to join two tables that each have an array column like the following

SELECT a.id, b.value
FROM a INNER JOIN b
ON a.array IN b.array

or

SELECT a.id, b.value
FROM a INNER JOIN b
ON UNNEST(a.array) IN UNNEST(b.array)

According to this SO question, postgres has operators like <@ and >@ that compares if either is a subset of the other array ( postgres doc page) but BigQuery only allows an element of the array to be compared with the other array like the following

a.arrayelement IN UNNEST(b.array)

Can it be done in BigQuery?

edit

This is the schema I am working with

WITH b AS (
    {  "ip": "192.168.1.1",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "peach",
          "value: "pink"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
    ,{  "ip": "192.168.1.2",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        }
      ]
    }
   ),
WITH a AS (
    {  "id": "12345",
      "cookie": [
        { "key": "peach",
          "value: "pink"
        }
      ]
    }
    ,{  "id": "67890",
      "cookie": [
        { "key": "apple",
          "value: "red"
        },
        { "key": "orange",
          "value: "orange"
        },

      ]
     }
)

I am expecting an output like the following

ip, id
192.168.1.1, 67890 
192.168.1.2, 67890 
192.168.1.2, 12345

It is a continuation of the following SO, How do I find elements in an array in BigQuery . I tried using subqueries to compare a single element of one of the array, but BigQuery returns an error saying that I have "too many subqueries"

like image 724
dorachan2010 Avatar asked Apr 27 '17 19:04

dorachan2010


People also ask

How do you select an array in BigQuery?

To declare a specific data type for an array, use angle brackets ( < and > ). For example: SELECT ARRAY<FLOAT64>[1, 2, 3] as floats; Arrays of most data types, such as INT64 or STRING , don't require that you declare them first.

What is array AGG in BigQuery?

ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls. To learn more about the OVER clause and how to use it, see Window function calls.

How do you use structs in BigQuery?

Struct, being the Record data type, doesn't need to be unnested. Only unnested Array of Structs (Record, Repeated) will result in multiple rows with all Struct key-value pairs. You can also select few columns from Array of Structs by using unnest and selecting those particular columns with “.”


1 Answers

Here is an alternative solution, which avoids running JOIN in correlated subquery, and instead relies on IN UNNEST() expression - this should give better performance:

#standardSQL
WITH a AS (
  SELECT 1 AS id, [2,4] AS a_arr UNION ALL
  SELECT 2, [3,5]
),
b AS (
  SELECT 11 AS value, [1,2,3,4] AS b_arr UNION ALL
  SELECT 12, [1,3,5,6]
)
SELECT a.id, b.value
FROM a , b
WHERE (SELECT LOGICAL_AND(a_i IN UNNEST(b.b_arr)) FROM UNNEST(a.a_arr) a_i)
like image 195
Mosha Pasumansky Avatar answered Sep 17 '22 08:09

Mosha Pasumansky