Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving failed jobs from a table with retry details (id and retry count)

Tags:

sql

sql-server

My apologies for a non-intuitive thread title.

I have a table, Jobs, where each row represents a maintenance task performed by a computer program. It has this design:

CREATE TABLE Jobs (
    JobId bigint PRIMARY KEY,
    ...
    Status int NOT NULL,
    OriginalJobId bigint NULL
)

When a Job is created/started, its row is added to the table and its status is 0. When a job is completed its status is updated to 1 and when a job fails its status is updated to 2. When a job fails, the job-manager will retry the job by inserting a new row into the Jobs table by duplicating the details of the failed job and reset the Status to 0 and use the original (failed) JobId in OriginalJobId for tracking purposes. If this re-attempt fails then it should be tried again up to 3 times, each subsequent retry will maintain the original JobId in the OriginalJobId column.

My problem is trying to formulate a query to get the current set of Jobs that have failed and get their retry count.

Here's a sample data in the table:

JobId | Status | OriginalJobId
    1,       1,           NULL    -- Successful initial job
    2,       0,           NULL    -- Pending initial job
    3,       2,           NULL    -- Failed initial job
    4,       1,              3    -- Successful retry of Job 3
    5,       2,           NULL    -- Failed initial job
    6,       2,              5    -- Failed retry 1 of Job 5
    7,       2,              5    -- Failed retry 2 of Job 5 -- should be tried again for 1 more time
    8,       2,           NULL    -- Failed initial job
    9,       2,              8    -- Failed retry 1 of Job 8
   10,       2,              8    -- Failed retry 2 of Job 8
   11,       2,              8    -- Failed retry 3 of Job 8 -- don't try again
   12,       2,           NULL    -- Failed initial job

My query needs to return this:

 JobId | RetryCount
     5,           2
    12,           0

Notice how Job 3 isn't included because its last retry succeeded (status 1). Similarly Job 8 is excluded because the number of retries exceeds the limit of 3. Job 5 is included because it still failed and has had only 2 retries, and Job 12 is included and hasn't yet had any retries.

I'm thinking the solution would be something like this:

SELECT
    J1.JobId
FROM
    Jobs AS J1
    LEFT OUTER JOIN Jobs AS J2 ON J1.JobId = J2.OriginalJobId
WHERE
    J1.Status = 2

...but I can't think of how to get the RetryCount data.

Here is a SQLFiddle I created for this problem, with one of the solutions below:

http://sqlfiddle.com/#!6/8765f

Update

Here is an updated SQLFiddle that compares the 5 solutions provided so far (I added an extra HAVING clause to remove jobs that had more than 3 retries)

http://sqlfiddle.com/#!6/8765f/23

Performance-wise, I think GarethD's answer is the best as it has the simplest execution plan and tends to finish with the fastest time in SqlFiddle.

My production table has about 14,000,000 rows so obviously the results will be different there. I'll try each out in production and see which is the fastest then and choose an answer accordingly.

Thank you all for the help!

like image 292
Dai Avatar asked Nov 21 '14 09:11

Dai


People also ask

How do I get a list of failed jobs in SQL Server?

Open SQL Server Management Studio (SSMS) and connect to the corresponding database instance. Navigate to Management -> SQL Server Logs -> SQL job name. Find the job failure event and review the log details.

How do I make my SQL job fail?

Add a new step with name [Failure Notification] at the end of the job, and mark down the job step number [N], set both "On success action" and "On failure action" to "Quit the job reporting failure", as this step is used for reporting job failure, so we need the job history to show failure if this step is ever executed ...

How do I find the SQL Server Job ID?

Just use (JOBID). Since SQL Server 2005 SP1 you have to use macro like $(ESCAPE_NONE(JOBID)). No problem. The "N'" I think makes an implicit conversion to NVARCHAR of the (JOBID)...

How can check job query status in SQL Server?

To view job activity In Object Explorer, connect to an instance of the SQL Server Database Engine, and then expand that instance. Expand SQL Server Agent. Right-click Job Activity Monitor and click View Job Activity. In the Job Activity Monitor, you can view details about each job that is defined for this server.


2 Answers

The following returns the result required:

SELECT  J1.JobId,
        Retries = COUNT(J2.JobId)
FROM    Jobs AS J1
        INNER JOIN Jobs AS J2 
            ON J1.JobId = J2.OriginalJobId
WHERE   J1.Status = 2
GROUP BY J1.JobId
HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;

I have changed it to an INNER join so that only jobs that have been retried are included, although this could feasibly be changed back to a LEFT join to include failed jobs that have not been retried yet. I also added a HAVING clause to exclude any jobs that have not failed when they have been retried.


EDIT

As mentioned above, using INNER JOIN will mean that you only return jobs that have been retried, to get all failed jobs you need to use a LEFT JOIN, this will mean that retries are returned as failed jobs, so I have added an additional predicate J1.OriginalJobId IS NULL to ensure only the original jobs are returned:

SELECT  J1.JobId,
        Retries = COUNT(J2.JobId)
FROM    Jobs AS J1
        LEFT JOIN Jobs AS J2 
            ON J1.JobId = J2.OriginalJobId
WHERE   J1.Status = 2
AND     J1.OriginalJobId IS NULL
GROUP BY J1.JobId
HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;

Example on SQL Fiddle

like image 181
GarethD Avatar answered Oct 12 '22 08:10

GarethD


This should do the job. It does a COALESCE to combine JobId and OriginalJobId, gets the retry count by grouping them up then excluding any jobs that have a status of 1.

SELECT COALESCE(j.OriginalJobId, j.JobId) JobId, 
       COUNT(*)-1 RetryCount
FROM Jobs j
WHERE j.[Status] = 2
AND NOT EXISTS  (SELECT 1
                FROM Jobs 
                WHERE COALESCE(Jobs.OriginalJobId, Jobs.JobId) = COALESCE(j.OriginalJobId, j.JobId)
                AND Jobs.[Status] = 1)
GROUP BY COALESCE(j.OriginalJobId, j.JobId), j.[Status]
like image 3
DavidG Avatar answered Oct 12 '22 10:10

DavidG