Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find the last time that a PostgreSQL database has been updated?

Tags:

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.

I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.

like image 940
WolfmanDragon Avatar asked May 22 '09 18:05

WolfmanDragon


People also ask

How do you check when was the table last updated in postgres?

You can do it via checking last modification time of table's file. In postgresql,every table correspond one or more os files,like this: select relfilenode from pg_class where relname = 'test'; the relfilenode is the file name of table "test".

How do I check PostgreSQL history?

There's no history in the database itself, if you're using psql you can use "\s" to see your command history there. You can get future queries or other types of operations into the log files by setting log_statement in the postgresql. conf file.

What is the current date in PostgreSQL?

The PostgreSQL CURRENT_DATE function returns the current date (the system date on the machine running PostgreSQL) as a value in the 'YYYY-MM-DD' format. In this format, 'YYYY' is a 4-digit year, 'MM' is a 2-digit month, and 'DD' is a 2-digit day. The returned value is a date data type.

What is the update query in PostgreSQL?

The PostgreSQL UPDATE Query is used to modify the existing records in a table. You can use WHERE clause with UPDATE query to update the selected rows. Otherwise, all the rows would be updated.


1 Answers

You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.

So for example a typical way is the following one:

CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  NEW.last_updated := now();
  RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
  BEFORE INSERT OR UPDATE ON sometable
  FOR EACH ROW EXECUTE PROCEDURE stamp_updated();

Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:

SELECT MAX(max_last_updated) FROM (
  SELECT MAX(last_updated) AS max_last_updated FROM sometable
  UNION ALL
  SELECT MAX(last_updated) FROM someothertable
) updates

For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.

-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1

Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)

An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:

CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
  RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
 AFTER INSERT OR UPDATE ON sometable
 FOR EACH STATEMENT EXECUTE stamp_update_log();

This will give you a table with a row for each table update: you can then just do:

SELECT MAX(updated) FROM update_log

To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).

An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.

like image 80
araqnid Avatar answered Oct 27 '22 16:10

araqnid