Table cleanup, finding duplicate posts in time span

Question

I have a structure like this in a few tables: id, [...], validfrom, validto.

The id is a NUMBER, and the validfrom and validto columns are of type DATE. Any given date should not result in more than one post per id.

So this is a correct example:

id, validfrom, validto
1, 2000-01-01, 2000-02-20
1, 2000-02-21, 2000-03-02
1, 2000-03-03, 2099-12-31

However, there seem to be some issues where a certain dates would return more than one value. Something like this (which is corrupt data):

id, validfrom, validto
1, 2001-01-01, 2001-02-20
1, 2001-01-15, 2001-03-02
1, 2001-03-03, 2099-12-31

So in the above example, any date between 2001-01-15 and 2001-02-20 would return two rows.

How would I construct a script that finds all thees corrupt posts?

Florin stands with Ukraine · Accepted Answer

Just to find them, assuming validfrom is lesser than validto in every row:

select a.*, b.*
from your_table a
join your_table b
on (a.id = b.id and
    --overlapping
    greatest(a.validfrom, b.validfrom) <= least(a.validto, b.validto) and
    --exclude join the same row.
    a.rowid <> b.rowid
    )

This just find intersecting intervals, because distinct intervals have a valid_from greater than valid_to of the other.

UPDATE: I replaced the condition not (a.validto=b.validto and a.validfrom=b.validfrom) with

a.rowid<> b.rowid

because it will report the duplicate rows now. (Thanks wolfi)

Table cleanup, finding duplicate posts in time span

Tags:

oracle-database

sql

plsql

David W.

1 Answers

Florin stands with Ukraine

Recent Activity

Donate For Us

Table cleanup, finding duplicate posts in time span

Tags:

oracle-database

sql

plsql

David W.

1 Answers

Florin stands with Ukraine

Related questions

Recent Activity

Donate For Us