Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NOT IN not working in google BigQuery standard sql

I am using Google BigQuery,i am trying to find the 'userid's from 'table2' excluding the ones that are stored in 'table1' 2 or more times. this is the code :

#standardSQL
WITH t100 AS (
SELECT count_table.userid 
From(
SELECT userid,COUNT(`project.dataset.table1`.userid) as notification_count 
FROM `project.dataset.table1`
 GROUP BY userid) as count_table 
where notification_count >= 2 
)

SELECT userid FROM `project.dataset.table2` WHERE userid NOT IN (SELECT userid  FROM t100)

the problem is that this is returning the 'userid's from 'table1' that are stored 2 or more times, i have tried adding WHERE userid IS NOT NULL to the SELECT userid FROM t100, yet it made no difference. and just so that everything is clearer, this : SELECT userid FROM t100, is not empty and the results returned for some reason still show in the result of the first code above.

like image 489
jaafar Nasrallah Avatar asked Jan 10 '17 07:01

jaafar Nasrallah


People also ask

Does BigQuery use standard SQL?

BigQuery supports the Google Standard SQL dialect, but a legacy SQL dialect is also available. If you are new to BigQuery, you should use Google Standard SQL as it supports the broadest range of functionality. For example, features such as DDL and DML statements are only supported using Google Standard SQL.

Where is not in SQL?

Description. The SQL NOT condition (sometimes called the NOT Operator) is used to negate a condition in the WHERE clause of a SELECT, INSERT, UPDATE, or DELETE statement.

How do you write not equal to in BigQuery?

NOT EQUAL TO (!=) and EXISTS... EQUAL TO Giving Different Results.

IS NOT NULL in BigQuery?

BigQuery IFNULL() Description If expr is NULL, return null_result. Otherwise, return expr. If expr is not NULL, null_result is not evaluated. expr and null_result can be any type and must be implicitly coercible to a common supertype.


3 Answers

i have tried adding WHERE userid IS NOT NULL to the SELECT userid FROM t100, yet it made no difference

This of course had no affect because when you do COUNT(userid) as notification_count it always returns 0 for userid NULL thus was filtered out by HAVING notification_count >= 2
If you would use COUNT(1) instead - that's where you would potentially get null userids in output of t100. So userid is NULL is definitelly not an issue here

As others pointed - your query should work - so if you continue getting the problem - you need to dig more in this issue and provide us with more details on it

Meantime, try below as yet another version of your (otherwise looking good) query

#standardSQL
WITH t100 AS (
  SELECT userid
  FROM `project.dataset.table1`
  GROUP BY userid
  HAVING COUNT(userid) >= 2 
)
SELECT userid
FROM `project.dataset.table2` AS t2
LEFT join t100 ON t100.userid = t2.userid
WHERE t100.userid IS NULL
like image 115
Mikhail Berlyant Avatar answered Oct 24 '22 06:10

Mikhail Berlyant


It's due to null handling. There was a similar post on our issue tracker about NOT IN versus NOT EXISTS. The documentation for IN states:

IN with a NULL in the IN-list can only return TRUE or NULL, never FALSE

To achieve the semantics that you want, you should use an anti semijoin (NOT EXISTS). For example,

#standardSQL
WITH t100 AS (
  SELECT
    userid,
    COUNT(userid) as notification_count 
  FROM `project.dataset.table1`
  GROUP BY userid
  HAVING notification_count >= 2 
)
SELECT userid
FROM `project.dataset.table2` AS t2
WHERE NOT EXISTS (SELECT 1 FROM t100 WHERE userid = t2.userid);
like image 27
Elliott Brossard Avatar answered Oct 24 '22 05:10

Elliott Brossard


Not sure why this isn't working, but out of general principle, I never use (not) in in combination with a select statement. Rather, I would left outer join the subquery and filter on null values therein:

#standardSQL

with t100 as (
select
  count_table.userid

from(
select
  userid
  ,count(`project.dataset.table1`.userid) as notification_count 

from `project.dataset.table1`

group by
  userid
) as count_table 

where notification_count >= 2 
)

select
  t2.userid as userid

from `project.dataset.table2` t2
left outer join t100
  on t100.userid = t2.userid

where t100.userid is null
like image 32
oulenz Avatar answered Oct 24 '22 05:10

oulenz