Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of string contains in google bigquery

I have a table like as shown below

enter image description here

I would like to create two new binary columns indicating whether the subject had steroids and aspirin. I am looking to implement this in Postgresql and google bigquery

I tried the below but it doesn't work

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1


SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

I expect my output to be like as shown below

enter image description here

like image 396
The Great Avatar asked Oct 02 '19 08:10

The Great


2 Answers

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id   

if to apply to sample data from your question - result is

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1     

Note: instead of simple LIKE ending with lengthy and redundant text - I am using LIKE on steroids - which is REGEXP_CONTAINS

like image 169
Mikhail Berlyant Avatar answered Nov 17 '22 13:11

Mikhail Berlyant


In Postgres, I would recommend using the filter clause:

select subject_id,
       count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
       count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

In BigQuery, I would recommend countif():

select subject_id,
       countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
       countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

You can use sum(case when . . . end) as a more general approach. However, each database has a more "local" way of expressing this logic. By the way, the FILTER clause is standard SQL, just not widely adopted.

like image 4
Gordon Linoff Avatar answered Nov 17 '22 15:11

Gordon Linoff