I have an SQL table with 4 columns:
id
- varchar(50)g1
- varchar(50)g2
- varchar(50)datetime
- timestamp I have this query:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc
This query was built to answer: What is the distributions of g1
value out of the users?
Each id
may have multiple records in the table. I wish to consider the earliest one. early means, the minimum datetime
value.
id g1 g2 datetime
x1 a start 2016-01-19 21:01:22
x1 c start 2016-01-19 21:01:21
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
g1 count total share
a 2 4 0.5
b 1 4 0.25
c 1 4 0.25
we have 4 records, but I only want to consider two records:
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
which are the earliest records per id
.
g1 count total share
a 1 2 0.5
b 1 2 0.5
How do I consider only the earliest record, per id
, in the group by
Here is a solution which should work in SQL Server, and any database which supports CTE:
WITH cte AS
(
SELECT t1.g1,
COUNT(*) AS count
FROM yourTable t1
INNER JOIN
(
SELECT id, MIN(datetime) AS datetime
FROM yourTable
GROUP BY id
) t2
ON t1.id = t2.id AND
t1.datetime = t2.datetime
)
SELECT t.g1,
t.count,
(SELECT COUNT(*) FROM cte) AS total,
t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With