Retrieving failed jobs from a table with retry details (id and retry count)

Tags:

sql-server

My apologies for a non-intuitive thread title.

I have a table, Jobs, where each row represents a maintenance task performed by a computer program. It has this design:

CREATE TABLE Jobs (
    JobId bigint PRIMARY KEY,
    ...
    Status int NOT NULL,
    OriginalJobId bigint NULL
)

When a Job is created/started, its row is added to the table and its status is 0. When a job is completed its status is updated to 1 and when a job fails its status is updated to 2. When a job fails, the job-manager will retry the job by inserting a new row into the Jobs table by duplicating the details of the failed job and reset the Status to 0 and use the original (failed) JobId in OriginalJobId for tracking purposes. If this re-attempt fails then it should be tried again up to 3 times, each subsequent retry will maintain the original JobId in the OriginalJobId column.

My problem is trying to formulate a query to get the current set of Jobs that have failed and get their retry count.

Here's a sample data in the table:

JobId | Status | OriginalJobId
    1,       1,           NULL    -- Successful initial job
    2,       0,           NULL    -- Pending initial job
    3,       2,           NULL    -- Failed initial job
    4,       1,              3    -- Successful retry of Job 3
    5,       2,           NULL    -- Failed initial job
    6,       2,              5    -- Failed retry 1 of Job 5
    7,       2,              5    -- Failed retry 2 of Job 5 -- should be tried again for 1 more time
    8,       2,           NULL    -- Failed initial job
    9,       2,              8    -- Failed retry 1 of Job 8
   10,       2,              8    -- Failed retry 2 of Job 8
   11,       2,              8    -- Failed retry 3 of Job 8 -- don't try again
   12,       2,           NULL    -- Failed initial job

My query needs to return this:

 JobId | RetryCount
     5,           2
    12,           0

Notice how Job 3 isn't included because its last retry succeeded (status 1). Similarly Job 8 is excluded because the number of retries exceeds the limit of 3. Job 5 is included because it still failed and has had only 2 retries, and Job 12 is included and hasn't yet had any retries.

I'm thinking the solution would be something like this:

SELECT
    J1.JobId
FROM
    Jobs AS J1
    LEFT OUTER JOIN Jobs AS J2 ON J1.JobId = J2.OriginalJobId
WHERE
    J1.Status = 2

...but I can't think of how to get the RetryCount data.

Here is a SQLFiddle I created for this problem, with one of the solutions below:

http://sqlfiddle.com/#!6/8765f

Update

Here is an updated SQLFiddle that compares the 5 solutions provided so far (I added an extra HAVING clause to remove jobs that had more than 3 retries)

http://sqlfiddle.com/#!6/8765f/23

Performance-wise, I think GarethD's answer is the best as it has the simplest execution plan and tends to finish with the fastest time in SqlFiddle.

My production table has about 14,000,000 rows so obviously the results will be different there. I'll try each out in production and see which is the fastest then and choose an answer accordingly.

Thank you all for the help!

292

asked Nov 21 '14 09:11

Dai

2 Answers

The following returns the result required:

SELECT  J1.JobId,
        Retries = COUNT(J2.JobId)
FROM    Jobs AS J1
        INNER JOIN Jobs AS J2 
            ON J1.JobId = J2.OriginalJobId
WHERE   J1.Status = 2
GROUP BY J1.JobId
HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;

I have changed it to an INNER join so that only jobs that have been retried are included, although this could feasibly be changed back to a LEFT join to include failed jobs that have not been retried yet. I also added a HAVING clause to exclude any jobs that have not failed when they have been retried.

EDIT

As mentioned above, using INNER JOIN will mean that you only return jobs that have been retried, to get all failed jobs you need to use a LEFT JOIN, this will mean that retries are returned as failed jobs, so I have added an additional predicate J1.OriginalJobId IS NULL to ensure only the original jobs are returned:

SELECT  J1.JobId,
        Retries = COUNT(J2.JobId)
FROM    Jobs AS J1
        LEFT JOIN Jobs AS J2 
            ON J1.JobId = J2.OriginalJobId
WHERE   J1.Status = 2
AND     J1.OriginalJobId IS NULL
GROUP BY J1.JobId
HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;

Example on SQL Fiddle

181

answered Oct 12 '22 08:10

GarethD

This should do the job. It does a COALESCE to combine JobId and OriginalJobId, gets the retry count by grouping them up then excluding any jobs that have a status of 1.

SELECT COALESCE(j.OriginalJobId, j.JobId) JobId, 
       COUNT(*)-1 RetryCount
FROM Jobs j
WHERE j.[Status] = 2
AND NOT EXISTS  (SELECT 1
                FROM Jobs 
                WHERE COALESCE(Jobs.OriginalJobId, Jobs.JobId) = COALESCE(j.OriginalJobId, j.JobId)
                AND Jobs.[Status] = 1)
GROUP BY COALESCE(j.OriginalJobId, j.JobId), j.[Status]

answered Oct 12 '22 10:10

DavidG

Related questions
                            
                                Find all foreign keys constraints in database referencing a certain primary key [duplicate]
                            
                                Query returns "LIKE" results despite not having wildcards?
                            
                                sql linked server join query
                            
                                Display the results of a SELECT COUNT(*) in PHP
                            
                                how to create a Foreign-Key constraint to a subset of the rows of a table?
                            
                                sql server displaying missing dates
                            
                                How to see queries executing on Informix DB
                            
                                MySQL LOAD DATA Error (Errcode: 2 - "No such file or directory")
                            
                                SQL Server (0x80131904): String or binary data would be truncated [duplicate]
                            
                                How to order the fields getting aggregated by XMLAGG function?
                            
                                Is there sql WITH clause equivalent in hive?
                            
                                How to use GROUP BY to concatenate strings while joining multiple tables?
                            
                                Remove all duplicates from Redshift database
                            
                                How to find most overlapping time period with date ranges
                            
                                Find Partition Schema Definitions in SQL Server Database
                            
                                Is there a standard approach for dealing with unordered arrays (sets) in PostgreSQL?
                            
                                Trying to run SparkSQL over Spark Streaming
                            
                                SQL to_char Currency Formatting
                            
                                Repository - order by in native query not working
                            
                                ORA-01756: quoted string not properly terminated when I try to execute my code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With