My apologies for a non-intuitive thread title.
I have a table, Jobs
, where each row represents a maintenance task performed by a computer program. It has this design:
CREATE TABLE Jobs (
JobId bigint PRIMARY KEY,
...
Status int NOT NULL,
OriginalJobId bigint NULL
)
When a Job is created/started, its row is added to the table and its status is 0
. When a job is completed its status is updated to 1
and when a job fails its status is updated to 2
. When a job fails, the job-manager will retry the job by inserting a new row into the Jobs table by duplicating the details of the failed job and reset the Status
to 0
and use the original (failed) JobId in OriginalJobId
for tracking purposes. If this re-attempt fails then it should be tried again up to 3 times, each subsequent retry will maintain the original JobId
in the OriginalJobId
column.
My problem is trying to formulate a query to get the current set of Jobs that have failed and get their retry count.
Here's a sample data in the table:
JobId | Status | OriginalJobId
1, 1, NULL -- Successful initial job
2, 0, NULL -- Pending initial job
3, 2, NULL -- Failed initial job
4, 1, 3 -- Successful retry of Job 3
5, 2, NULL -- Failed initial job
6, 2, 5 -- Failed retry 1 of Job 5
7, 2, 5 -- Failed retry 2 of Job 5 -- should be tried again for 1 more time
8, 2, NULL -- Failed initial job
9, 2, 8 -- Failed retry 1 of Job 8
10, 2, 8 -- Failed retry 2 of Job 8
11, 2, 8 -- Failed retry 3 of Job 8 -- don't try again
12, 2, NULL -- Failed initial job
My query needs to return this:
JobId | RetryCount
5, 2
12, 0
Notice how Job 3
isn't included because its last retry succeeded (status 1
). Similarly Job 8
is excluded because the number of retries exceeds the limit of 3. Job 5
is included because it still failed and has had only 2 retries, and Job 12
is included and hasn't yet had any retries.
I'm thinking the solution would be something like this:
SELECT
J1.JobId
FROM
Jobs AS J1
LEFT OUTER JOIN Jobs AS J2 ON J1.JobId = J2.OriginalJobId
WHERE
J1.Status = 2
...but I can't think of how to get the RetryCount data.
Here is a SQLFiddle I created for this problem, with one of the solutions below:
http://sqlfiddle.com/#!6/8765f
Here is an updated SQLFiddle that compares the 5 solutions provided so far (I added an extra HAVING
clause to remove jobs that had more than 3 retries)
http://sqlfiddle.com/#!6/8765f/23
Performance-wise, I think GarethD's answer is the best as it has the simplest execution plan and tends to finish with the fastest time in SqlFiddle.
My production table has about 14,000,000 rows so obviously the results will be different there. I'll try each out in production and see which is the fastest then and choose an answer accordingly.
Thank you all for the help!
Open SQL Server Management Studio (SSMS) and connect to the corresponding database instance. Navigate to Management -> SQL Server Logs -> SQL job name. Find the job failure event and review the log details.
Add a new step with name [Failure Notification] at the end of the job, and mark down the job step number [N], set both "On success action" and "On failure action" to "Quit the job reporting failure", as this step is used for reporting job failure, so we need the job history to show failure if this step is ever executed ...
Just use (JOBID). Since SQL Server 2005 SP1 you have to use macro like $(ESCAPE_NONE(JOBID)). No problem. The "N'" I think makes an implicit conversion to NVARCHAR of the (JOBID)...
To view job activity In Object Explorer, connect to an instance of the SQL Server Database Engine, and then expand that instance. Expand SQL Server Agent. Right-click Job Activity Monitor and click View Job Activity. In the Job Activity Monitor, you can view details about each job that is defined for this server.
The following returns the result required:
SELECT J1.JobId,
Retries = COUNT(J2.JobId)
FROM Jobs AS J1
INNER JOIN Jobs AS J2
ON J1.JobId = J2.OriginalJobId
WHERE J1.Status = 2
GROUP BY J1.JobId
HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;
I have changed it to an INNER
join so that only jobs that have been retried are included, although this could feasibly be changed back to a LEFT
join to include failed jobs that have not been retried yet. I also added a HAVING
clause to exclude any jobs that have not failed when they have been retried.
EDIT
As mentioned above, using INNER JOIN
will mean that you only return jobs that have been retried, to get all failed jobs you need to use a LEFT JOIN
, this will mean that retries are returned as failed jobs, so I have added an additional predicate J1.OriginalJobId IS NULL
to ensure only the original jobs are returned:
SELECT J1.JobId,
Retries = COUNT(J2.JobId)
FROM Jobs AS J1
LEFT JOIN Jobs AS J2
ON J1.JobId = J2.OriginalJobId
WHERE J1.Status = 2
AND J1.OriginalJobId IS NULL
GROUP BY J1.JobId
HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;
Example on SQL Fiddle
This should do the job. It does a COALESCE to combine JobId
and OriginalJobId
, gets the retry count by grouping them up then excluding any jobs that have a status of 1.
SELECT COALESCE(j.OriginalJobId, j.JobId) JobId,
COUNT(*)-1 RetryCount
FROM Jobs j
WHERE j.[Status] = 2
AND NOT EXISTS (SELECT 1
FROM Jobs
WHERE COALESCE(Jobs.OriginalJobId, Jobs.JobId) = COALESCE(j.OriginalJobId, j.JobId)
AND Jobs.[Status] = 1)
GROUP BY COALESCE(j.OriginalJobId, j.JobId), j.[Status]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With