I am using Google BigQuery,i am trying to find the 'userid's from 'table2' excluding the ones that are stored in 'table1' 2 or more times. this is the code :
#standardSQL
WITH t100 AS (
SELECT count_table.userid
From(
SELECT userid,COUNT(`project.dataset.table1`.userid) as notification_count
FROM `project.dataset.table1`
GROUP BY userid) as count_table
where notification_count >= 2
)
SELECT userid FROM `project.dataset.table2` WHERE userid NOT IN (SELECT userid FROM t100)
the problem is that this is returning the 'userid's from 'table1' that are stored 2 or more times, i have tried adding WHERE userid IS NOT NULL
to the SELECT userid FROM t100
, yet it made no difference.
and just so that everything is clearer, this :
SELECT userid FROM t100
, is not empty and the results returned for some reason still show in the result of the first code above.
BigQuery supports the Google Standard SQL dialect, but a legacy SQL dialect is also available. If you are new to BigQuery, you should use Google Standard SQL as it supports the broadest range of functionality. For example, features such as DDL and DML statements are only supported using Google Standard SQL.
Description. The SQL NOT condition (sometimes called the NOT Operator) is used to negate a condition in the WHERE clause of a SELECT, INSERT, UPDATE, or DELETE statement.
NOT EQUAL TO (!=) and EXISTS... EQUAL TO Giving Different Results.
BigQuery IFNULL() Description If expr is NULL, return null_result. Otherwise, return expr. If expr is not NULL, null_result is not evaluated. expr and null_result can be any type and must be implicitly coercible to a common supertype.
i have tried adding WHERE userid IS NOT NULL to the SELECT userid FROM t100, yet it made no difference
This of course had no affect because when you do COUNT(userid) as notification_count
it always returns 0 for userid NULL thus was filtered out by HAVING notification_count >= 2
If you would use COUNT(1) instead - that's where you would potentially get null userids in output of t100. So userid is NULL
is definitelly not an issue here
As others pointed - your query should work - so if you continue getting the problem - you need to dig more in this issue and provide us with more details on it
Meantime, try below as yet another version of your (otherwise looking good) query
#standardSQL
WITH t100 AS (
SELECT userid
FROM `project.dataset.table1`
GROUP BY userid
HAVING COUNT(userid) >= 2
)
SELECT userid
FROM `project.dataset.table2` AS t2
LEFT join t100 ON t100.userid = t2.userid
WHERE t100.userid IS NULL
It's due to null handling. There was a similar post on our issue tracker about NOT IN
versus NOT EXISTS
. The documentation for IN states:
IN with a NULL in the IN-list can only return TRUE or NULL, never FALSE
To achieve the semantics that you want, you should use an anti semijoin (NOT EXISTS
). For example,
#standardSQL
WITH t100 AS (
SELECT
userid,
COUNT(userid) as notification_count
FROM `project.dataset.table1`
GROUP BY userid
HAVING notification_count >= 2
)
SELECT userid
FROM `project.dataset.table2` AS t2
WHERE NOT EXISTS (SELECT 1 FROM t100 WHERE userid = t2.userid);
Not sure why this isn't working, but out of general principle, I never use (not) in
in combination with a select statement. Rather, I would left outer join
the subquery and filter on null values therein:
#standardSQL
with t100 as (
select
count_table.userid
from(
select
userid
,count(`project.dataset.table1`.userid) as notification_count
from `project.dataset.table1`
group by
userid
) as count_table
where notification_count >= 2
)
select
t2.userid as userid
from `project.dataset.table2` t2
left outer join t100
on t100.userid = t2.userid
where t100.userid is null
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With