Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regex in Bigquery

I am unable to apply a proper regex on customtarget column in bigquery.

With normal MSSQL:

SELECT * from mytable where CustomTargeting like = '%u=%'  -- is all okay

With Bigquery(legacy-sql) :

SELECT REGEXP_EXTRACT(CustomTargeting, r'[^u=\d]') as validate_users
from [project:dataset.impressions_4213_20181112] Limit 10

Error:

Exactly one capturing group must be specified

Update:

Yet couldn't get substring u ='anystring'

enter image description here

How can I extract data where CustomTargeting ='%u=somestring%'?

like image 200
Shivkumar kondi Avatar asked Dec 04 '18 14:12

Shivkumar kondi


People also ask

How do I combine strings in BigQuery?

All you have to do is input your values under the Google BigQuery CONCAT command to combine them quickly. The BigQuery CONCAT command helps in the concatenation of two or more Strings into a single result.

What is Regexp_contains?

REGEXP_CONTAINS Description Returns TRUE if value is a partial match for the regular expression, regex . If the regex argument is invalid, the function returns an error. You can search for a full match by using ^ (beginning of text) and $ (end of text).

How do you find and replace in BigQuery?

Find and Replace: BigQuery First, make sure you click in the BigQuery development environment. Next, on Mac, hit cmd + f. On Windows, you'll need to hit control + f. We'll test this theory by trying to find and replace a variable in the monthly spending query I wrote about below.


1 Answers

For BigQuery Legacy SQL

In SELECT statement list you can use
SELECT REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)')

In WHERE clause - you can use
WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')

So, you query can look like

#legacySQL
SELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)') 
FROM [project:dataset.impressions_4213_20181112]
WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')   

For BigQuery Standard SQL

Same for SELECT
But different for WHERE - WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')

#standardSQL
SELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)') 
FROM `project.dataset.impressions_4213_20181112`
WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')  

Update - To address provided data example:

Regular expression updated from r'^u=(\d*)') to r'(?:^|;)u=(\d*)') - hope it is self-descriptive, but if not - it makes match to be searched either at the begonning of string or after ;

like image 51
Mikhail Berlyant Avatar answered Sep 22 '22 02:09

Mikhail Berlyant