Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Temporary tables bloating pg_attribute

I'm using COPY to insert large batches of data into our database from CSVs. The insert looks something like this:

-- This tmp table will contain all the items that we want to try to insert
CREATE TEMP TABLE tmp_items
(
    field1 INTEGER NULL,
    field2 INTEGER NULL,
    ...
) ON COMMIT DROP;

COPY tmp_items(
    field1,
    field2,
    ...
) FROM 'path\to\data.csv' WITH (FORMAT csv);

-- Start inserting some items
WITH newitems AS (
    INSERT INTO items (field1, field2)
    SELECT tmpi.field1, tmpi,field2
    FROM tmp_items tmpi
    WHERE some condition

    -- Return the new id and other fields to the next step
    RETURNING id AS newid, field1 AS field1
)
-- Insert the result into another temp table
INSERT INTO tmp_newitems SELECT * FROM newitems;

-- Use tmp_newitems to update other tables
etc....

When will then use the data in tmp_items to do multiple inserts in multiple tables. We check for duplicates and manipulate the data in a few ways before inserting, so not everything in tmp_items will be used or inserted as is. We do this by a combination of CTEs and more temporary tables.

This works very well and is fast enough for our needs. We do loads of these and the problem we have is that pg_attribute is becoming very bloated quite fast and autovacuum doesn't seem to be able to keep up (and consumes a lot of CPU).

My questions are:

  1. Is it possible to perform this kind of insert without using temp tables?
  2. If not, should we just make autovacuum of pg_attribute more agressive? Won't that take up as much or more CPU?
like image 711
Joel Avatar asked May 16 '18 08:05

Joel


2 Answers

The best solution would be that you create your temporary tables at at session start with

CREATE TEMPORARY TABLE ... (
   ...
) ON COMMIT DELETE ROWS;

Then the temporary tables would be kept for the duration of the session but emptied at every commit.

This will reduce the bloat of pg_attribute considerable, and bloating shouldn't be a problem any more.

You could also join the dark side (be warned, this is unsupported):

  • Start PostgreSQL with

    pg_ctl start -o -O
    

    so that you can modify system catalogs.

  • Connect as superuser and run

    UPDATE pg_catalog.pg_class
    SET reloptions = ARRAY['autovacuum_vacuum_cost_delay=0']
    WHERE oid = 'pg_catalog.pg_attribute'::regclass;
    

Now autovacuum will run much more aggressively on pg_attribute, and that will probably take care of your problem.

Mind that the setting will be gone after a major upgrade.

like image 76
Laurenz Albe Avatar answered Oct 31 '22 01:10

Laurenz Albe


I know this is an old question, but somebody might find my help useful here in the future.

So we're very heavy on temp tables having >500 rps and async i\o via nodejs and thus experienced a very heavy bloating of pg_attribute because of that. All you are left with is a very aggressive vacuuming which halts performance. All answers given here do not solve this, because droping and recreating temp table bloats pg_attribute heavily and therefore one sunny morning you will find db performance dead, and pg_attribute 200+ gb while your db would be like 10gb.

So the solution is elegantly this

create temp table if not exists my_temp_table (description) on commit delete rows;

So you go on playing with temp tables, save your pg_attribute, no dark side heavy vacuuming and get desired performance.

don't forget

vacuum full pg_depend;
vacuum full pg_attribute;

Cheers :)

like image 4
Ivan Kolyhalov Avatar answered Oct 30 '22 23:10

Ivan Kolyhalov