I have the following data in a SQL Table:
I need to query the data so I can get a list of missing "familyid" per employee.
For example, I should get for Employee 1021 that is missing in the sequence the IDs: 2 and 5 and for Employee 1027 should get the missing numbers 1 and 6.
Any clue on how to query that?
Appreciate any help.
Step 1: Find the common difference of each pair of consecutive terms in the sequence by subtracting each term from the term that comes directly after it. Step 2: Add the common difference to the number prior to the first missing number in the sequence. Step 3: Repeat Step 2 for any other missing numbers.
In order to find out which cards are missing, we need to know where the gaps are in the sequential numbering. You can use generate series to generate numbers from 1 to the highest id of your table. Then run a query where id not in this series.
Find the first missing value
I would use the ROW_NUMBER
window function to assign the "correct" sequence ID number. Assuming that the sequence ID restarts every time the employee ID changes:
SELECT
e.id,
e.name,
e.employee_number,
e.relation,
e.familyid,
ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
FROM employee_members e
Then, I would filter the result set to only include the rows with mismatching sequence IDs:
SELECT *
FROM (
SELECT
e.id,
e.name,
e.employee_number,
e.relation,
e.familyid,
ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
FROM employee_members e
) a
WHERE a.familyid <> a.sequenceid
Then again, you should easily group by employee_number
and find the first missing sequence ID for each employee:
SELECT
a.employee_number,
MIN(a.sequence_id) AS first_missing
FROM (
SELECT
e.id,
e.name,
e.employee_number,
e.relation,
e.familyid,
ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
FROM employee_members e
) a
WHERE a.familyid <> a.sequenceid
GROUP BY a.employee_number
Finding all the missing values
Extending the previous query, we can detect a missing value every time the difference between familyid
and sequenceid
changes:
-- Warning: this is totally untested :-/
SELECT
b.employee_number,
MIN(b.sequence_id) AS missing
FROM (
SELECT
a.*,
a.familyid - a.sequenceid AS displacement
SELECT
e.*,
ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
FROM employee_members e
) a
) b
WHERE b.displacement <> 0
GROUP BY
b.employee_number,
b.displacement
Here is one approach. Calculate the maximum family id for each employee. Then join this to a list of numbers up to the maximum family id. The result has one row for each employee and expected family id.
Do a left outer join
from this back to the original data, on the familyid
and the number. Where nothing matches, those are the missing values:
with nums as (
select 1 as n
union all
select n+1
from nums
where n < 20
)
select en.employee, n.n as MissingFamilyId
from (select employee, min(familyid) as minfi, max(familyid) as maxfi
from t
group by employee
) en join
nums n
on n.n <= maxfi left outer join
t
on t.employee = en.employee and
t.familyid = n.n
where t.employee_number is null;
Note that this will not work when the missing familyid
is that last number in the sequence. But it might be the best that you can do with your data structure.
Also the above query assumes that there are at most 20 family members.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With