Only select first row of repeating value in a column in SQL

Question

I have table that has a column that may have same values in a burst. Like this:

+----+---------+
| id |   Col1  | 
+----+---------+
| 1  | 6050000 |
+----+---------+
| 2  | 6050000 |
+----+---------+
| 3  | 6050000 |
+----+---------+
| 4  | 6060000 |
+----+---------+
| 5  | 6060000 |
+----+---------+
| 6  | 6060000 |
+----+---------+
| 7  | 6060000 |
+----+---------+
| 8  | 6060000 |
+----+---------+
| 9  | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+

Now I want to prune rows where the value of Col1 is repeated and only select the first occurrence.
For the above table the result should be:

+----+---------+
| id |   Col1  | 
+----+---------+
| 1  | 6050000 |
+----+---------+
| 4  | 6060000 |
+----+---------+
| 9  | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+

How can I do this in SQL?
Note that only burst rows should be removed and values can be repeated in non-burst rows! id=1 & id=9 are repeated in sample result.

EDIT:
I achieved it using this:

select id,col1 from data as d1
where not exists (
    Select id from data as d2
    where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)

But this only works when ids are sequential. With gaps between ids (deleted ones) the query breaks. How can I fix this?

The Pickle · Accepted Answer

select min(id), Col1 from tableName group by Col1

Erwin Brandstetter · Answer

You can use a EXISTS semi-join to identify candidates:

Select wanted rows:

SELECT * FROM tbl t
WHERE  NOT EXISTS (
    SELECT *
    FROM   tbl
    WHERE  col1 = t.col1
    AND    id = t.id - 1
    )
ORDER  BY id;

Get rid of unwanted rows:

DELETE FROM tbl AS t
-- SELECT * FROM tbl t  -- check first?
WHERE EXISTS (
    SELECT *
    FROM   tbl
    WHERE  col1 = t.col1
    AND    id   = t.id - 1
    );

This effectively deletes every row, where the preceding row has the same value in col1, thereby arriving at your set goal: only the first row of every burst survives.

I left the commented SELECT statement because you should always check what is going to be deleted before you do the deed.

Solution for non-sequential IDs:

If your RDBMS supports CTEs and window functions (like PostgreSQL, Oracle, SQL Server, ... but not SQLite prior to v3.25, MS Access or MySQL prior to v8.0.1), there is an elegant way:

WITH cte AS (
    SELECT *, row_number() OVER (ORDER BY id) AS rn
    FROM   tbl
    )
SELECT id, col1
FROM   cte c
WHERE  NOT EXISTS (
    SELECT *
    FROM   cte
    WHERE  col1 = c.col1
    AND    rn   = c.rn - 1
    )
ORDER  BY id;

Another way doing the job without those niceties (should work for you):

SELECT id, col1
FROM   tbl t
WHERE  (
    SELECT col1 = t.col1
    FROM   tbl
    WHERE  id < t.id
    ORDER  BY id DESC
    LIMIT  1) IS NOT TRUE
ORDER  BY id;

Only select first row of repeating value in a column in SQL

Tags:

sql

sqlite

duplicates

ms-access

Ariyan

2 Answers

The Pickle

Select wanted rows:

Get rid of unwanted rows:

Solution for non-sequential IDs:

Erwin Brandstetter

Recent Activity

Donate For Us

Only select first row of repeating value in a column in SQL

Tags:

sql

sqlite

duplicates

ms-access

Ariyan

2 Answers

The Pickle

Select wanted rows:

Get rid of unwanted rows:

Solution for non-sequential IDs:

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us