I need to perform a query 2.5 million times. This query generates some rows which I need to <code>AVG(column)</code> and then use this <code>AVG</code> to filter the table from all values below average. I then need to <code>INSERT</code> these filtered results into a table. The only way to do such a thing with reasonable efficiency, seems to be by creating a <code>TEMPORARY TABLE</code> for each query-postmaster python-thread. I am just hoping these <code>TEMPORARY TABLE</code>s will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course. I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)

Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See <code>ON COMMIT</code>. Temporary table are, however, dropped at the end of a database session: <blockquote> Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction. </blockquote> There are multiple considerations you have to take into account: <ul> <li>If you do want to explicitly <code>DROP</code> a temporary table at the end of a transaction, create it with the <code>CREATE TEMPORARY TABLE ... ON COMMIT DROP</code> syntax.</li> <li> In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in <code>CREATE</code>, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the <code>ON COMMIT DROP</code> creation syntax), or on an as-needed basis (by preceding any <code>CREATE TEMPORARY TABLE</code> statement with a corresponding <code>DROP TABLE IF EXISTS</code>, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)</li> <li>While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the <code>temp_buffers</code> option in <code>postgresql.conf</code> </li> <li>Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (<code>auto_vacuum</code>).</li> </ul> Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an <code>ANALYZE</code> on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.

Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests. EDIT: If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

PostgreSQL temporary tables

Tags:

performance

optimization

postgresql

temp-tables

I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column) and then use this AVG to filter the table from all values below average. I then need to INSERT these filtered results into a table.

The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLE for each query-postmaster python-thread. I am just hoping these TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.

I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)

983

asked Jan 28 '09 01:01

Nicholas Leonard

2 Answers

Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.

Temporary table are, however, dropped at the end of a database session:

Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction.

There are multiple considerations you have to take into account:

If you do want to explicitly DROP a temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROP syntax.
In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROP creation syntax), or on an as-needed basis (by preceding any CREATE TEMPORARY TABLE statement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffers option in postgresql.conf
Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).

Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an ANALYZE on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.

answered Sep 27 '22 20:09

vladr

Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.

EDIT: If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

answered Sep 27 '22 18:09

Adam Hawes

Related questions
                            
                                C code loop performance [continued]
                            
                                Why does enabling hardware-acceleration in CSS3 slow down performance?
                            
                                What does the times mean in Google Chrome's timeline in the network panel?
                            
                                How is CPU usage calculated?
                            
                                On Performance and Java Interoperability: Clojure vs. Scala
                            
                                How can numpy be so much faster than my Fortran routine?
                            
                                What's the best way (most efficient) to turn all the keys of an object to lower case?
                            
                                Ternary operator ?: vs if...else
                            
                                visibility:hidden vs display:none vs opacity:0
                            
                                Why are operators so much slower than method calls? (structs are slower only on older JITs)
                            
                                Does Google Analytics have performance overhead?
                            
                                Horrible redraw performance of the DataGridView on one of my two screens
                            
                                Best timing method in C?
                            
                                How Do I Measure the Performance of my AngularJS app's digest Cycle?
                            
                                ALTER TABLE ADD COLUMN takes a long time
                            
                                Performance of bcp/BULK INSERT vs. Table-Valued Parameters
                            
                                Fastest way to strip all non-printable characters from a Java String
                            
                                Big difference (x9) in the execution time between almost identical code in C and C++
                            
                                Which "if" construct is faster - statement or ternary operator?
                            
                                What's your favorite profiling tool (for C++) [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With