Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need approach for working with small subsets of a large dataset

I am facing a conceptual problem that I am having a hard time overcoming. I am hoping the SO folks can help me overcome it with a nudge in the right direction.

I am in the process of doing some ETL work with the source data being very similar and very large. I am loading it into a table that is intended for replication and I only want the most basic of information in this target table.

My source table looks something like this:

alt text

I need my target table to reflect it as such:

alt text

As you can see I didn't duplicate the InTransit status where it was duplicated in the source table. The steps I am trying to figure out how to achieve are

  1. Get any new distinct rows entered since the last time the query ran. (Easy)
  2. For each TrackingId I need to check if each new status is already the most recent status in the target and if so disregard otherwise go ahead and insert it. Which this means I have to also start at the earliest of the new statuses and go from there. (I have no *(!#in clue how I'll do this)
  3. Do this every 15 minutes so that statuses are kept very recent so step #2 must be performant.

My source table could easily consist of 100k+ rows but having the need to run this every 15 minutes requires me to make sure this is very performant thus why I am really trying to avoid cursors.

Right now the only way I can see to do this is using a CLR sproc but I think there may be better ways thus I am hoping you guys can nudge me in the right direction.

I am sure I am probably leaving something out that you may need so please let me know what info you may need and I'll happily provide.

Thank you in advance!

EDIT: Ok I wasn't explicit enough in my question. My source table is going to contain multiple tracking Ids. It may be up to 100k+ rows containing mulitple TrackingId's and multiple statuses for each trackingId. I have to update the target table as above for each individual tracking Id but my source will be an amalgam of trackingId's.

like image 288
joshlrogers Avatar asked Jan 28 '26 06:01

joshlrogers


1 Answers

Here's a solution without self-joins:

WITH    q AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (ORDER BY statusDate) AS rn,
                ROW_NUMBER() OVER (PARTITION BY status ORDER BY statusDate) AS rns
        FROM    tracking
        WHERE   tackingId = @id
        ),
        qs AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY rn - rns ORDER BY statusDate) AS rnn
        FROM    q
        )
SELECT  *
FROM    qs
WHERE   rnn = 1
ORDER BY
        statusDate

Here's a script to check:

DECLARE @tracking TABLE
        (
        id INT NOT NULL PRIMARY KEY,
        trackingId INT NOT NULL,
        status INT,
        statusDate DATETIME
        )

INSERT
INTO    @tracking
SELECT  1, 1, 1, DATEADD(d, 1, '2010-01-01')
UNION ALL
SELECT  2, 1, 2, DATEADD(d, 2, '2010-01-01')
UNION ALL
SELECT  3, 1, 2, DATEADD(d, 3, '2010-01-01')
UNION ALL
SELECT  4, 1, 2, DATEADD(d, 4, '2010-01-01')
UNION ALL
SELECT  5, 1, 3, DATEADD(d, 5, '2010-01-01')
UNION ALL
SELECT  6, 1, 3, DATEADD(d, 6, '2010-01-01')
UNION ALL
SELECT  7, 1, 4, DATEADD(d, 7, '2010-01-01')
UNION ALL
SELECT  8, 1, 2, DATEADD(d, 8, '2010-01-01')
UNION ALL
SELECT  9, 1, 2, DATEADD(d, 9, '2010-01-01')
UNION ALL
SELECT  10, 1, 1, DATEADD(d, 10, '2010-01-01')
;
WITH    q AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (ORDER BY statusDate) AS rn,
                ROW_NUMBER() OVER (PARTITION BY status ORDER BY statusDate) AS rns
        FROM    @tracking
        ),
        qs AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY rn - rns ORDER BY statusDate) AS rnn
        FROM    q
        )
SELECT  *
FROM    qs
WHERE   rnn = 1
ORDER BY
        statusDate
like image 174
Quassnoi Avatar answered Jan 29 '26 22:01

Quassnoi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!