I'm working on a project, where we need to count the number of distinct rows. A simplified version of the scenario includes a user
table, a keyword
table and a keyword_user
table.
The user
table just includes common user meta data, like name etc. The other tables are listed below.
keyword_user:
id
user_id
keyword_id
keyword:
id,
description
What I want to do, is to find a max number of users (5), based on the users keyword_id's, while also counting the total number of matching rows. The count must be distinct.
The query:
SELECT TOP 5 u.[id],
u.[firstname],
u.[lastname],
total = Count(*) OVER()
FROM [user] u
INNER JOIN [keyword_user] ku
ON u.[id] = ku.[user_id]
WHERE ( ku.keyword_id IN ( '5f6501ec-0a71-4067-a21d-3c5f87a76411', 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf'))
AND u.id NOT IN ( '12db3001-b3b9-4626-8a02-2519102cb53a' )
Resultset:
+--------------------------------------+-----------+----------+-------+
| id | firstname | lastname | total |
+--------------------------------------+-----------+----------+-------+
| F0527AC3-747A-45A6-9CF9-B1F6C7F548F8 | Kasper | Thomsen | 3 |
| 95988F6D-9C91-4779-B6C3-3D4B4D6AE836 | Michael | Jacobsen | 3 |
| 95988F6D-9C91-4779-B6C3-3D4B4D6AE836 | Michael | Jacobsen | 3 |
+--------------------------------------+-----------+----------+-------+
PROBLEM:
The problem being here, that Michael is counted twice, and the total count therefore is 3, when i want it to be 2. When using count() over()
you cannot parse an expression into it, that contains distinct. Also, if I just SELECT DISTINCT
, my resultset looks fine, besides the total count, which would still be 3.
If I need to include more information to support the question, please let me know, and I will try to answer the best I can.
MSSQL CREATE DB SCRIPT (SAMPLE DATA)
example_data.sql
wanted resultset:
+--------------------------------------+-----------+----------+-------+
| id | firstname | lastname | total |
+--------------------------------------+-----------+----------+-------+
| F0527AC3-747A-45A6-9CF9-B1F6C7F548F8 | Kasper | Thomsen | 2 |
| 95988F6D-9C91-4779-B6C3-3D4B4D6AE836 | Michael | Jacobsen | 2 |
+--------------------------------------+-----------+----------+-------+
You really should explain what you need in the question, not in comments.
In CTE_Users
we find all distinct users for the given keywords.
Then join the result with user
to get user details. At least it produces the result that you expect with the given small sample data.
WITH
CTE_Users
AS
(
SELECT DISTINCT ku.user_id
FROM
keyword_user AS ku
WHERE
ku.keyword_id IN (
'5f6501ec-0a71-4067-a21d-3c5f87a76411',
'c19b95c0-8554-4bbd-9526-db8f1c4f1edf')
AND ku.user_id NOT IN (
'12db3001-b3b9-4626-8a02-2519102cb53a')
)
SELECT TOP(5)
u.id
,u.firstname
,u.lastname
,COUNT(*) OVER() AS total
FROM
user AS u
INNER JOIN CTE_Users ON CTE_Users.user_id = u.id
;
i'm a little confused in your case specially with the "keywords" and how they're related per user (this is just a process issue for me) thus found myself doing an outer group by containing your initial query as my source table.
Please comment below so we can improve this.
SELECT
id
, firstname
, lastname
, total
, COUNT(*) AS [per_user_count]
FROM (
SELECT TOP 5 u.[id],
u.[firstname],
u.[lastname],
total = Count(*) OVER()
FROM [user] u
INNER JOIN [keyword_user] ku
ON u.[id] = ku.[user_id]
WHERE
(
ku.keyword_id IN (
'5f6501ec-0a71-4067-a21d-3c5f87a76411'
, 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf'
)
)
AND u.id NOT IN ('12db3001-b3b9-4626-8a02-2519102cb53a')
) AS T
GROUP BY
T.id
, T.firstname
, T.lastname
, T.total
EDIT : We really had a confusion there, then so I created a simpler script that shall exclude the keywords, just the unique users (to generate the total overall) and get the top 5 of it (random order).
SELECT
TOP 5
T.id
, T.firstname
, T.lastname
, Total = COUNT(*) OVER()
FROM (
SELECT DISTINCT
u.*
FROM [keyword_user] ku
LEFT JOIN [user] u
ON
ku.user_id = u.id
WHERE
(
ku.keyword_id IN (
'5f6501ec-0a71-4067-a21d-3c5f87a76411'
, 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf')
)
AND ku.[user_id] NOT IN (
'12db3001-b3b9-4626-8a02-2519102cb53a'
)
) AS T
Thanks
EDIT : Your scenario is a straight forward "search of keywords" that is linked to an entity, with an overall count and a top 5 result. As how I understand the CTE (and based in MSDN), CTE is a great solution for hierarchical data mining (no need to do while and do whatever back-flip to get your organizational hierarchy) which doesn't really fit the scenario that we have here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With