My use case is the following: I'm working on periodical data imports to a PostgreSQL database where some external data has to be imported at regular intervals.
The caveat is the structure of the data might change from one import to the other, therefore I'm truncating + dropping all columns on every import before inserting the new data.
I would like to wrap the entire operation within a single transaction, so in case something goes wrong, the transaction would be rolled back and the old data would still be accessible (lesser of two evils kind-of-thing).
As an example, here is what a data import statement might look like:
BEGIN
ALTER TABLE "external_data" DROP "date"
ALTER TABLE "external_data" DROP "column1"
ALTER TABLE "external_data" DROP "column2"
ALTER TABLE "external_data" ADD "date" date DEFAULT NULL
ALTER TABLE "external_data" ADD "column1" text DEFAULT NULL
ALTER TABLE "external_data" ADD "column2" text DEFAULT NULL
ALTER TABLE "external_data" ADD "column3" text DEFAULT NULL
INSERT INTO "external_data" ("date","column1","column2","column3") VALUES ('20170523','Berlin','Chrome','1'),('20170524','Berlin','Chrome','2')
COMMIT
This is currently not working. The INSERT statement gets stuck because, when it's called, the table is still locked from the ALTER TABLE statement that preceded it.
Is there any way to achieve this within Postgres transactions or should I give up and go for some other solution application-side?
This is currently not working. The INSERT statement gets stuck because, when it's called, the table is still locked from the ALTER TABLE statement that preceded it.
No, a transaction can't lock itself that way. The INSERT would be blocked if it was initiated by another transaction, but not by the one that already has a strong lock on the object. There is no problem in dropping the column and doing a subsequent INSERT in the same transaction.
The reason why it seems to be stuck is probably, as mentioned in the comments, that if you feed the sequence of queries from the question to an interactive interpreter, it would not execute any query at all, because there is no indication of the end of any query. If the interpreter is psql this sequence lacks either semi-colons or a \g meta-command at the end of queries.
A SQL query by itself does not need a semi-colon at its end, it's only when several queries can be submitted together that it's required.
You can see the example in http://rextester.com/OTU89086
There are a limit of how much columns you can add (even if you are dropping others). If you do a lot You get:
54011: tables can have at most 1600 columns
You can see that problen here:
--PostgreSQL 9.6
--'\\' is a delimiter
select version() as postgresql_version;
drop table if exists "external_data";
create table "external_data"(
"date" date,
"column1" integer,
"column2" text,
"column3" boolean
);
BEGIN TRANSACTION;
create or replace function do_the_import()
returns text
language plpgsql as
$body$
begin
ALTER TABLE "external_data" DROP "date";
ALTER TABLE "external_data" DROP "column1";
ALTER TABLE "external_data" DROP "column2";
ALTER TABLE "external_data" DROP "column3";
ALTER TABLE "external_data" ADD "date" date DEFAULT NULL;
ALTER TABLE "external_data" ADD "column1" text DEFAULT NULL;
ALTER TABLE "external_data" ADD "column2" text DEFAULT NULL;
ALTER TABLE "external_data" ADD "column3" text DEFAULT NULL;
INSERT INTO "external_data" ("date","column1","column2","column3") VALUES ('20170523','Berlin','Chrome','1'),('20170524','Berlin','Chrome','2');
return current_timestamp::text;
end;
$body$;
select count(do_the_import()) from generate_series(1,1000);
COMMIT;
Try it here: http://rextester.com/RPER86062
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With