How to tell if record has changed in Postgres

Tags:

I have a bit of an "upsert" type of question... but, I want to throw it out there because it's a little bit different than any that I've read on stackoverflow.

Basic problem.

I'm working on moving from mysql to PostgreSQL 9.1.5 (hosted on Heroku). As a part of that, I need to import multiple CSV files everyday. Some of the data is sales information and is almost guaranteed to be new and need to be inserted. But, other parts of the data is almost guaranteed to be the same. For example, the csv files (note plural) will have POS (point of sale) information in them. This rarely changes (and is most likely only via additions). Then there is product information. There are about 10,000 products (vast majority will be unchanged, but it's possible to have both additions and updates).

The final item (but is important), is that I have a requirement to be able to provide an audit trail/information for any given item. For example, if I add a new POS record, I need to be able to trace that back to the file it was found in. If I change a UPC code or description of a product, then I need to be able to trace it back to the import (and file) where the change came from.

Solution that I'm contemplating.

Since the data is provided to me via CSV, then I'm working around the idea that COPY will be the best/fastest way. The structure of the data in the files is not exactly what I have in the database (i.e. final destination). So, I'm copying them into tables in the staging schema that match the CSV (note: one schema per datasource). The tables in the staging schemas will have a before insert row triggers. These triggers can decide what to do with the data (insert, update or ignore).

For the tables that are most likely to contain new data, then it will try to insert first. If the record is already there, then it will return NULL (and stop the insert into the staging table). For tables that rarely change, then it will query the table and see if the record is found. If it is, then I need a way to see if any of the fields are changed. (because remember, I need to show that the record was modified by import x from file y) I obviously can just boiler plate out the code and test each column. But, was looking for something a little more "eloquent" and more maintainable than that.

In a way, what I'm kind of doing is combining a importing system with an audit trail system. So, in researching audit trails, I reviewed the following wiki.postgresql.org article. It seems like the hstore might be a nice way of getting changes (and being able to easily ignore some columns in the table that aren't important - e.g. "last_modified")

I'm about 90% sure it will all work... I've created some testing tables etc and played around with it.

My question?

Is a better, more maintainable way of accomplishing this task of finding the maybe 3 records out of 10K that require a change to the database. I could certainly write a python script (or something else) that reads the file and tries to figure out what to do with each record, but that feels horribly inefficient and will lead to lots of round trips.

A few final things:

I don't have control over the input files. I would love it if they only sent me the deltas, but they don't and it's completely outside of my control or influence.
he system is grow and new data sources are likely to be added that will greatly increase the amount of data being processed (so, I'm trying to keep things efficient)
I know this is not nice, simple SO question (like "how to sort a list in python") but I believe one of the great things about SO is that you can ask hard questions and people will share their thoughts about how they think the best way to solve it is.

411

asked Sep 19 '12 20:09

David S

1 Answers

I have lots of similar operations. What I do is COPY to temporary staging tables:

CREATE TEMP TABLE target_tmp AS
SELECT * FROM target_tbl LIMIT 0;  -- only copy structure, no data

COPY target_tmp FROM '/path/to/target.csv';

For performance, run ANALYZE - temp. tables are not analyzed by autovacuum!

ANALYZE target_tmp;

Also for performance, maybe even create an index or two on the temp table, or add a primary key if the data allows for that.

ALTER TABLE ADD CONSTRAINT target_tmp_pkey PRIMARY KEY(target_id);

You don't need the performance stuff for small imports.

Then use the full scope of SQL commands to digest the new data.
For instance, if the primary key of the target table is target_id ..

Maybe DELETE what isn't there any more?

DELETE FROM target_tbl t
WHERE NOT EXISTS (
   SELECT 1 FROM target_tmp t1
   WHERE  t1.target_id = t.target_id
);

Then UPDATE what's already there:

UPDATE target_tbl t
SET    col1 = t1.col1
FROM   target_tmp t1
WHERE  t.target_id = t1.target_id

To avoid empty UPDATEs, simply add:

...
AND    col1 IS DISTINCT FROM t1.col1; -- repeat for relevant columns

Or, if the whole row is relevant:

...
AND    t IS DISTINCT FROM t1;         -- check the whole row

Then INSERT what's new:

INSERT INTO target_tbl(target_id, col1)
SELECT t1.target_id, t1.col1
FROM   target_tmp t1
LEFT   JOIN target_tbl t USING (target_id)
WHERE  t.target_id IS NULL;

Clean up if your session goes on (temp tables are dropped at end of session automatically):

DROP TABLE target_tmp;

Or use ON COMMIT DROP or similar with CREATE TEMP TABLE.
_{Code untested, but should work in any modern version of PostgreSQL except for typos.}

123

answered Nov 15 '22 04:11

Erwin Brandstetter

Related questions
                            
                                Prevent delete/update for tables by even superadmin/dba?
                            
                                postgresql: how list indexed columns?
                            
                                ActiveRecord unable to reconnect when running inside forked/threaded application?
                            
                                How to delete the highest value in a group
                            
                                SQL - How to escape parentheses in PostgreSQL
                            
                                Weird id values in Postgresql with JPA 2 Sequence generation
                            
                                Saving Base64 String in Rails
                            
                                Postgresql - Clean way to insert records if they don't exist, update if they do
                            
                                Converting a ruby on rails app from sqlite to postgresql
                            
                                Performance of postgresql with large numbers of tables (EG: 1 million tables)?
                            
                                Unicorn! and PostgreSQL
                            
                                ERROR: function unnest(integer[]) does not exist in postgresql
                            
                                SQL: Difference between "BETWEEN" vs "current_date - number"
                            
                                What is the proper way to manually sequence a column in Postgres?
                            
                                Time not updating in heroku
                            
                                Cumulative Monthly Totals and Postgresql
                            
                                Running PostgreSQL with Supervisord
                            
                                Can't start Homebrew PostgreSQL install on Mac OS X - "Permission Denied" [closed]
                            
                                Can a multi-field UNIQUE constraint include fields from more than one table?
                            
                                How to pass a php array to a postgres function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to tell if record has changed in Postgres

Tags:

file-io

postgresql

postgresql-9.1

postgresql-copy

upsert

David S

People also ask

1 Answers

Erwin Brandstetter

Recent Activity

Donate For Us