Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Missing Dates in Data

I receive a daily txt file of data from multiple antennas. The naming convention for the files is:

Unique Antenna ID + year + month + day + Random 3 digit number

I parsed the file names and created a table like this one:

AntennaID    fileyear    filemonth    fileday    filenumber     filename
0000         2016        09           22         459            000020160922459.txt
0000         2016        09           21         981            000020160921981.txt
0000         2016        09           20         762            000020160920762.txt
0001         2016        09           22         635            000120160922635.txt
.
.
.
etc. (200k rows)

Sometimes the antennas send either more than 1 file or no file at all. The unique 3 digit filenumber distinguishes the files if more than 1 gets sent, however I'm trying to find the days when a file wasn't sent.

I've tried a couple groupby statements to compare the amount of datafiles in a given month and to see if it matches up w/ the days in that month - but the problem is sometimes the antennas send more than 1 file per day which could artificially make up for a "missing" file if we're just comparing the count.

I'm seeking a more robust method for finding the dates or date ranges for the missing files. I've looked into the Partition and Over functions and feel as there might be potential there, but I'm unsure as to how to use them as I'm fairly new to SQL.

I'm using Microsoft SQL Server 2016

like image 910
mk8efz Avatar asked Jan 05 '23 07:01

mk8efz


1 Answers

You can use a common table expression (or cte for short) to create a table of dates. You can then join from this table to your Antenna data and look for dates that return a null value:

declare @MinDate date = getdate()-50
declare @MaxDate date = getdate()

;with Dates as
(
select @MinDate as DateValue

union all

select dateadd(d,1,DateValue)
from Dates
where DateValue < @MaxDate
)
select d.DateValue
from Dates d
    left join AntennaData a
        on(d.DateValue = cast(cast(a.fileyear as nvarchar(4)) + cast(a.filemonth as nvarchar(4)) + cast(a.fileday as nvarchar(4)) as date))
option (maxrecursion 0)

Edit for anyone using this answer:

Whilst the recursive CTE will generate the list of dates, it is not the most efficient way to do so. If speed is important for you, use a set based tally table instead:

declare @MinDate date = getdate()-50;
declare @MaxDate date = getdate();

              -- Generate table with 10 rows
with t(t) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
              -- Add row numbers (-1 to start at adding 0 to retain @MinDate value) based on tally table to @MinDate for the number of days +1 (to ensure Min and Max date are included) between the two dates
    ,d(d) as (select top(datediff(day, @MinDate, @MaxDate)+1) dateadd(day,row_number() over (order by (select null))-1,@MinDate)
              from t t1,t t2,t t3,t t4,t t5,t t6    -- Cross join creates 10^6 or 10*10*10*10*10*10 = 1,000,000 row table
              )
select *
from d;
like image 138
iamdave Avatar answered Jan 08 '23 09:01

iamdave