Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter BigQuery arrays with DataStudio

I have a dataset with two columns in BigQuery:
1. url
2. tags

URL is a single value, and TAGS is an array(example below):

row | URL   &nbsp| TAGS
1      | x.com | donkey
                     | kong
                     | lives
                     | here


Can the filters in DataStudio work on BigQuery arrays i.e. tags column?

I would like to click on the filter dropdown in DataStudio and see a list:

1. donkey
2. kong
3. lives
4. here

Below is the code to query BigQuery arrays, I have to use where exists and unnest to get results from Tags column

select * FROM `any_table`
where exists(select * from unnest(tags) as tags where tags = 'donkey')

https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#scanning-for-specific-values

like image 395
Machine Learning Avatar asked Jan 27 '23 00:01

Machine Learning


1 Answers

Working with arrays is easy in Data Studio, as they're natively supported. Just set up a filter for the TAGS field, and it will work just as if those were all separate fields. The semantics will be that a chart or table being filtered will only include the top-level row if at least one item in the array matches the filter.

There are some caveats though.

Your table mockup is pretty much what happens, so if you were to make a scorecard in Data Studio for Count of URL, you would get 1. If you did the same Count on TAGS, you would get 4. In this way, I think be behavior is ideal for what most people would expect—you get access to all of your original data, much like you would see in the BigQuery results grid.

If you're using BI Engine to accelerate your dashboards, however, I've noticed that the behavior is sometimes a little different . . . namely that it seems more like the data gets CROSS JOIN'd, duplicating the non-array values to match the rows of the arrays. So with out count example, a table with both COUNT(URL) and COUNT(TAGS) would show the value of both to be 4. This can mess up some counting/summing, so just be mindful and careful when working with this. This only happens when you select both an array field AND a non-array field in the same chart/table . . . otherwise the behavior is what you'd expect. This is especially confusing for certain comparisons that can't be accelerated by BI Engine, since this changes the semantics potentially breaking your report. If you're only using things like COUNT_DISTINCT for those top-level fields, those will work fine either way. Otherwise, you may need to restructure your data to be safe.

If you have a vert large dataset and are concerned about reading more data than needed, you can also use Data Source Parameters as a way to send the filter value to the query, then only return the results already filtered on the BigQuery side—although most people would prefer the flexibility of doing this directly in Data Studio.

like image 87
justbeez Avatar answered May 16 '23 08:05

justbeez