Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

adding missing date in a table in PostgreSQL

I have a table that contains data for every day in 2002, but it has some missing dates. Namely, 354 records for 2002 (instead of 365). For my calculations, I need to have the missing data in the table with Null values

+-----+------------+------------+
| ID  |  rainfall  | date       |
+-----+------------+------------+
| 100 |  110.2     | 2002-05-06 |
| 101 |  56.6      | 2002-05-07 |
| 102 |  65.6      | 2002-05-09 |
| 103 |  75.9      | 2002-05-10 |
+-----+------------+------------+

you see that 2002-05-08 is missing. I want my final table to be like:

+-----+------------+------------+
| ID  |  rainfall  | date       |
+-----+------------+------------+
| 100 |  110.2     | 2002-05-06 |
| 101 |  56.6      | 2002-05-07 |
| 102 |            | 2002-05-08 |
| 103 |  65.6      | 2002-05-09 |
| 104 |  75.9      | 2002-05-10 |
+-----+------------+------------+

Is there a way to do that in PostgreSQL?

It doesn't matter if I have the result just as a query result (not necessarily an updated table)

like image 655
f.ashouri Avatar asked Nov 30 '22 02:11

f.ashouri


2 Answers

date is a reserved word in standard SQL and the name of a data type in PostgreSQL. PostgreSQL allows it as identifier, but that doesn't make it a good idea. I use thedate as column name instead.

Don't rely on the absence of gaps in a surrogate ID. That's almost always a bad idea. Treat such an ID as unique number without meaning, even if it seems to carry certain other attributes most of the time.

In this particular case, as @Clodoaldo commented, thedate seems to be a perfect primary key and the column id is just cruft - which I removed:

CREATE TEMP TABLE tbl (thedate date PRIMARY KEY, rainfall numeric);
INSERT INTO tbl(thedate, rainfall) VALUES
  ('2002-05-06', 110.2)
, ('2002-05-07', 56.6)
, ('2002-05-09', 65.6)
, ('2002-05-10', 75.9);

Query

Full table by query:

SELECT x.thedate, t.rainfall  -- rainfall automatically NULL for missing rows
FROM (
   SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
   FROM   tbl
   ) x
LEFT   JOIN tbl t USING (thedate)
ORDER  BY x.thedate

Similar to what @a_horse_with_no_name posted, but simplified and ignoring the pruned id.

Fills in gaps between first and last date found in the table. If there can be leading / lagging gaps, extend accordingly. You can use date_trunc() like @Clodoaldo demonstrated - but his query suffers from syntax errors and can be simpler.

INSERT missing rows

The fastest and most readable way to do it is a NOT EXISTS anti-semi-join.

INSERT INTO tbl (thedate, rainfall)
SELECT x.thedate, NULL
FROM (
   SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
   FROM   tbl
   ) x
WHERE NOT EXISTS (SELECT 1 FROM tbl t WHERE t.thedate = x.thedate)
like image 59
Erwin Brandstetter Avatar answered Dec 22 '22 11:12

Erwin Brandstetter


Just do an outer join against a query that returns all dates in 2002:

with all_dates as (
  select date '2002-01-01' + i as date_col
  from generate_series(0, extract(doy from date '2002-12-31')::int - 1) as i
)
select row_number() over (order by ad.date_col) as id, 
       t.rainfall,
       ad.date_col as date
from all_dates ad
  left join your_table t on ad.date_col = t.date
order by ad.date_col;

This will not change your table, it will just produce the result as desired.

Note that the generated id column will not contain the same values as the ID column in your table as it is merely a counter in the result set.

You could also replace the row_number() function with extract(doy from ad.date_col)

like image 41
a_horse_with_no_name Avatar answered Dec 22 '22 11:12

a_horse_with_no_name