I am checking website entrys that are recorded in a database
columns: browser, click_type_id, referrer, and datetime
if multiple rows have the same browser, click_type_id, and referrer and are timestamped (occur within 1 minute of one another) they are considered a duplicate.
I need a sql statement that can query for these duplicates based on the above criteria.
Any help is appreciated.
The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.
SELECT
T1.browser,
T1.click_type,
T1.referrer,
T1.datetime,
T2.datetime
FROM
My_Table T1
INNER JOIN My_Table T2 ON
T2.browser = T1.browser AND
T2.click_type = T1.click_type AND
T2.referrrer = T1.referrer AND
T2.datetime > T1.datetime AND
T2.datetime <= DATEADD(mi, 1, T1.datetime)
To prevent inserts
INSERT MyTable (browser, click_type_id, referrer, [datetime])
SELECT
@browser, @click_type_id, @referrer, @datetime
WHERE
NOT EXISTS (SELECT *
FROM
MyTable M2
WHERE
browser = @browser AND click_type_id = @click_type_id AND referrer = @referrer
AND
[datetime] < DATEADD(minute, -1, @datetime))
To find in existing data (relies on smalldatetime accuracy and may help to avoid issues as per comment to question)
SELECT
browser, click_type_id, referrer, COUNT(*)
FROM
MyTable
GROUP BY
browser, click_type_id, referrer, (CAST [datetime] AS smalldatetime)
HAVING
COUNT(*) > 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With