How to filter BigQuery arrays with DataStudio

Question

I have a dataset with two columns in BigQuery:
1. url
2. tags

URL is a single value, and TAGS is an array(example below):

Can the filters in DataStudio work on BigQuery arrays i.e. tags column?

I would like to click on the filter dropdown in DataStudio and see a list:

1. donkey
2. kong
3. lives
4. here

Below is the code to query BigQuery arrays, I have to use where exists and unnest to get results from Tags column

select * FROM `any_table`
where exists(select * from unnest(tags) as tags where tags = 'donkey')

https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#scanning-for-specific-values

justbeez · Accepted Answer

Working with arrays is easy in Data Studio, as they're natively supported. Just set up a filter for the TAGS field, and it will work just as if those were all separate fields. The semantics will be that a chart or table being filtered will only include the top-level row if at least one item in the array matches the filter.

There are some caveats though.

Your table mockup is pretty much what happens, so if you were to make a scorecard in Data Studio for Count of URL, you would get 1. If you did the same Count on TAGS, you would get 4. In this way, I think be behavior is ideal for what most people would expect—you get access to all of your original data, much like you would see in the BigQuery results grid.

If you're using BI Engine to accelerate your dashboards, however, I've noticed that the behavior is sometimes a little different . . . namely that it seems more like the data gets CROSS JOIN'd, duplicating the non-array values to match the rows of the arrays. So with out count example, a table with both COUNT(URL) and COUNT(TAGS) would show the value of both to be 4. This can mess up some counting/summing, so just be mindful and careful when working with this. This only happens when you select both an array field AND a non-array field in the same chart/table . . . otherwise the behavior is what you'd expect. This is especially confusing for certain comparisons that can't be accelerated by BI Engine, since this changes the semantics potentially breaking your report. If you're only using things like COUNT_DISTINCT for those top-level fields, those will work fine either way. Otherwise, you may need to restructure your data to be safe.

If you have a vert large dataset and are concerned about reading more data than needed, you can also use Data Source Parameters as a way to send the filter value to the query, then only return the results already filtered on the BigQuery side—although most people would prefer the flexibility of doing this directly in Data Studio.

How to filter BigQuery arrays with DataStudio

Tags:

google-bigquery

google-data-studio

Machine Learning

1 Answers

justbeez

Recent Activity

Donate For Us

How to filter BigQuery arrays with DataStudio

Tags:

google-bigquery

google-data-studio

Machine Learning

1 Answers

justbeez

Related questions

Recent Activity

Donate For Us