Filtering out duplicate subsequent records in a SELECT

Question

(PostgreSQL 8.4) Table "trackingMessages" stores tracking events between mobile devices (tm_nl_mobileid) and fixed devices (tm_nl_fixedId).

CREATE TABLE trackingMessages
(
  tm_id SERIAL PRIMARY KEY,           -- PK
  tm_nl_mobileId INTEGER,             -- FK to mobile
  tm_nl_fixedId INTEGER,              -- FK to fixed
  tm_date INTEGER,                    -- Network time
  tm_messageType INTEGER,             -- 0=disconnect, 1=connect
  CONSTRAINT tm_unique_row
    UNIQUE (tm_nl_mobileId, tm_nl_fixedId, tm_date, tm_messageType)
);

Problem here is that it's possible that the same mobile will connect to the same fixed twice (or more times) subsequently. I don't want to see the subsequent ones, but it's OK to see a mobile connected to a same fixed at a later date, provided there was a connection to a different fixed in between.

I think I'm close but not quite. I've been using the following CTE (found here on Stack Overflow)

WITH cte AS 
(
  SELECT tm_nl_fixedid, tm_date, Row_number() OVER (
    partition BY tm_nl_fixedid
    ORDER BY tm_date ASC
  ) RN 
  FROM   trackingMessages
) 
SELECT * FROM cte 
  WHERE tm_nl_mobileid = 150 AND tm_messagetype = 1
  ORDER BY tm_date;

Gives me the following results

32;1316538756;1
21;1316539069;1
32;1316539194;2
32;1316539221;3
21;1316539235;2

The problem here is that the last column should be 1, 1, 1, 2, 1, because that third "32" is in fact a duplicate tracking event (twice in a row at the same fixed) and that last connection to "21" is OK because "32" was in between.

Please don't suggest a cursor, this is what I am currently trying to move away from. The cursor solution does work, but it's too slow given the amount of records I have to deal with. I'd much rather fix the CTE and only select where RN = 1 ... unless you have a better idea!

Nikola Markovinović · Accepted Answer

Well, you're not that close because row_number() cannot track sequences by two groups at the same time. PARTITION BY tm_nl_fixedid ORDER BY date RESTART ON GAP does not exist, there's no such thing.

Itzik Ben-Gan has a solution for the islands and gaps problem you are facing (several solutions, actually). The idea is to order rows by the main criteria (date) and then by partitioning criteria + main criteria. Difference between ordinals will remain the same as they belong to the same partitioning criteria and date series.

with cte as
(
  select *,
      -- While order by date and order by something-else, date
      -- run along, they belong to the same sequence
         row_number() over (order by tm_date)
       - row_number() over (order by tm_nl_fixedid, tm_date) grp
    from trackingMessages
)
select *,
    -- Now we can get ordinal number grouped by each sequence
       row_number() over (partition by tm_nl_fixedid, grp
                          order by tm_date) rn
  from cte
 order by tm_date

Here is Sql Fiddle with example.

And here is chapter 5 of Sql Server MVP Deep Dives with several solutions to islands and gaps problem.

Erwin Brandstetter · Answer

This should be simpler with the window function lag():

WITH cte AS (
   SELECT *
         ,lag(tm_nl_fixedId) OVER (PARTITION BY tm_nl_mobileId
                                   ORDER BY tm_date) AS last_fixed
   FROM   trackingmessages
   )
SELECT *
FROM   cte
WHERE  last_fixed IS DISTINCT FROM tm_nl_fixedId
ORDER  BY tm_date

Explain

In the CTE, lag() gets the last fixed device to which a mobile connected (NULL for the first row per mobile - that's why I use IS DISTINCT FROM later, see a different approach here).
Then simply exclude all rows where the last fixed device was the same as this one, thereby excluding all "subsequent ones". All done.

Filtering out duplicate subsequent records in a SELECT

Tags:

sql

postgresql

window-functions

gaps-and-islands

denpanosekai

2 Answers

Nikola Markovinović

Explain

Erwin Brandstetter

Recent Activity

Donate For Us

Filtering out duplicate subsequent records in a SELECT

Tags:

sql

postgresql

window-functions

gaps-and-islands

denpanosekai

2 Answers

Nikola Markovinović

Explain

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us