Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a query to fill the gaps in a table due to bad data

I have a table with the following schema:

DateTime [Creation Date] PK
int [Data]

The column data has values coming from a sensor, something like this:

123
225
354
578
0
2151
2331
0
2555
2678

As you can see the value always increments.

Due to a problem in the sensor we get from time to time a 0 between valid values. This is producing us several problems when we try to use the data so we want to fill these 0 gaps with something. Ideally we would like to put an average between the previous and the following value, if this is not possible we want to repeat the previous value.

Is this something doable with just a query?

Thanks in advance.

like image 978
Ignacio Soler Garcia Avatar asked Feb 26 '13 11:02

Ignacio Soler Garcia


2 Answers

Maybe not the most efficient one, but should work:

WITH cte 
     AS (SELECT [Creation Date], 
                Data, 
                rn=Row_number() OVER(ORDER BY [Creation Date]) 
         FROM   dbo.Table) 
UPDATE cte 
SET    Data = ( ( (SELECT c2.Data
                    FROM   cte c2 
                    WHERE  c2.rn = cte.rn - 1) 
                   + (SELECT c2.Data
                      FROM   cte c2 
                      WHERE  c2.rn = cte.rn + 1) ) / 2 ) 
WHERE  Data = 0;

I use Row_Number in a CTE to get consecutive numbers ordered by Creation Date. Then this number is used to get the new data according to its previous and next value.

Here's a demo with similar schema(i have used an int instead of the datetime):

Update

Nice one but it does not handle gaps with multiple 0

Good catch, here's the modified sql which takes that into account:

WITH cte 
     AS (SELECT [Creation Date], 
                Data, 
                rn=Row_number() OVER(ORDER BY [Creation Date]) 
         FROM   dbo.Table) 
UPDATE cte 
SET    Data = ( ( (SELECT c2.Data
                    FROM   cte c2 
                    WHERE  c2.rn = (SELECT MAX(RN)FROM CTE c3 WHERE c3.RN<cte.RN AND c3.Data<>0)) 
                   + (SELECT c2.Data
                      FROM   cte c2 
                      WHERE  c2.rn = (SELECT MIN(RN)FROM CTE c3 WHERE c3.RN>cte.RN AND c3.Data<>0))) / 2 ) 
WHERE  Data = 0;

Demo (with consecutive zeros on 5,6)

like image 176
Tim Schmelter Avatar answered Oct 07 '22 17:10

Tim Schmelter


I've got another variation:

SELECT 
BadDate, 
T1.Data AS PrevData, 
T2.Data AS NextData,
(T1.Data + T2.Data) / 2 AS AvgValue
FROM

(

SELECT 
T1.CreationDate As BadDate, 
Max(T2.CreationDate) As PrevDate,
Min(T3.CreationDate) As NextDate

FROM 
TestData T1, 
TestData T2,
TestData T3

WHERE 

T1.Data = 0
AND T2.Data <> 0
AND T2.CreationDate < T1.CreationDate
AND T3.Data <> 0
AND T3.CreationDate > T1.CreationDate

GROUP BY T1.CreationDate

) DateData

INNER JOIN TestData T1
ON DateData.PrevDate = T1.CreationDate

INNER JOIN TestData T2
ON DateData.NextDate = T2.CreationDate
like image 32
Neil Mussett Avatar answered Oct 07 '22 15:10

Neil Mussett