I have a dataset with two columns in BigQuery:
1. url
2. tags
URL is a single value, and TAGS is an array(example below):
row | URL  | TAGS
Can the filters in DataStudio work on BigQuery arrays i.e. tags column?
I would like to click on the filter dropdown in DataStudio and see a list:
1. donkey
2. kong
3. lives
4. here
Below is the code to query BigQuery arrays, I have to use where exists and unnest to get results from Tags column
select * FROM `any_table`
where exists(select * from unnest(tags) as tags where tags = 'donkey')
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#scanning-for-specific-values
Working with arrays is easy in Data Studio, as they're natively supported. Just set up a filter for the TAGS
field, and it will work just as if those were all separate fields. The semantics will be that a chart or table being filtered will only include the top-level row if at least one item in the array matches the filter.
There are some caveats though.
Your table mockup is pretty much what happens, so if you were to make a scorecard in Data Studio for Count
of URL
, you would get 1. If you did the same Count
on TAGS
, you would get 4. In this way, I think be behavior is ideal for what most people would expect—you get access to all of your original data, much like you would see in the BigQuery results grid.
If you're using BI Engine to accelerate your dashboards, however, I've noticed that the behavior is sometimes a little different . . . namely that it seems more like the data gets CROSS JOIN
'd, duplicating the non-array values to match the rows of the arrays. So with out count example, a table with both COUNT(URL)
and COUNT(TAGS)
would show the value of both to be 4
. This can mess up some counting/summing, so just be mindful and careful when working with this. This only happens when you select both an array field AND a non-array field in the same chart/table . . . otherwise the behavior is what you'd expect. This is especially confusing for certain comparisons that can't be accelerated by BI Engine, since this changes the semantics potentially breaking your report. If you're only using things like COUNT_DISTINCT
for those top-level fields, those will work fine either way. Otherwise, you may need to restructure your data to be safe.
If you have a vert large dataset and are concerned about reading more data than needed, you can also use Data Source Parameters as a way to send the filter value to the query, then only return the results already filtered on the BigQuery side—although most people would prefer the flexibility of doing this directly in Data Studio.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With