I have a data structure that looks like this: <pre class="prettyprint"><code>Model Place primary key "id" foreign key "parent" -> Place foreign key "neighbor" -> Place (symmetryc) foreign key "belongtos" -> Place (asymmetric) a bunch of scalar fields ... </code></pre> I have over 5 million rows in the model table, and I need to insert ~50 million rows into each of the two foreign key tables. I have <code>SQL</code> files that look like this: <pre class="prettyprint"><code>INSERT INTO place_belongtos (from_place_id, to_place_id) VALUES (123, 456); </code></pre> and they are about 7 Gb each. The problem is, when I do <code>psql < belongtos.sql</code>, it takes me about 12 hours to import ~4 million rows on my AMD Turion64x2 CPU. OS is Gentoo ~amd64, PostgreSQL is version 8.4, compiled locally. The data dir is a bind mount, located on my second extended partition (<code>ext4</code>), which I believe is not the bottleneck. I'm suspecting it takes so long to insert the foreign key relations because <code>psql</code> checks for the key constraints for each row, which probably adds some unnecessary overhead, as I know for sure that the data is valid. Is there a way to speed up the import, i.e. temporarily disabling the constraint check?

<ol> <li>Make sure both foreign key constraints are DEFERRABLE </li> <li>Use COPY to load your data</li> <li>If you can't use COPY, use a prepared statement for your INSERT.</li> <li>Propper configuration settings will also help, check the WAL settings.</li> </ol>

How to import huge chunks of data to PostgreSQL?

Q: Can Postgres handle 100 million rows?

Aggregations vs. If you're simply filtering the data and data fits in memory, Postgres is capable of parsing roughly 5-10 million rows per second (assuming some reasonable row size of say 100 bytes). If you're aggregating then you're at about 1-2 million rows per second.

Q: How big is too big for a Postgres database?

There is no PostgreSQL-imposed limit on the number of indexes you can create on a table. Of course, performance may degrade if you choose to create more and more indexes on a table with more and more columns. PostgreSQL has a limit of 1GB for the size of any one field in a table.

Q: Which commands is the most efficient way to bulk load data into a Postgres table from a CSV file?

Goto solution for bulk loading into PostgreSQL is the native copy command.

Tags:

performance

sql

database

postgresql

I have a data structure that looks like this:

Model Place
    primary key "id"

    foreign key "parent" -> Place
    foreign key "neighbor" -> Place (symmetryc)
    foreign key "belongtos" -> Place (asymmetric)

    a bunch of scalar fields ...

I have over 5 million rows in the model table, and I need to insert ~50 million rows into each of the two foreign key tables. I have SQL files that look like this:

INSERT INTO place_belongtos (from_place_id, to_place_id) VALUES (123, 456);

and they are about 7 Gb each. The problem is, when I do psql < belongtos.sql, it takes me about 12 hours to import ~4 million rows on my AMD Turion64x2 CPU. OS is Gentoo ~amd64, PostgreSQL is version 8.4, compiled locally. The data dir is a bind mount, located on my second extended partition (ext4), which I believe is not the bottleneck.

I'm suspecting it takes so long to insert the foreign key relations because psql checks for the key constraints for each row, which probably adds some unnecessary overhead, as I know for sure that the data is valid. Is there a way to speed up the import, i.e. temporarily disabling the constraint check?

347

asked Aug 09 '10 04:08

Attila O.

1 Answers

Make sure both foreign key constraints are DEFERRABLE
Use COPY to load your data
If you can't use COPY, use a prepared statement for your INSERT.
Propper configuration settings will also help, check the WAL settings.

answered Sep 23 '22 11:09

Frank Heikens

Related questions
                            
                                How to use parameters with RPostgreSQL (to insert data)
                            
                                Query to List all hierarchical parents and siblings and their childrens, but not list own childrens
                            
                                how to connect to PostgreSQL server to query the database names list
                            
                                Weeks between two dates in Postgres
                            
                                PostgreSQL Bitmap Heap Scan on index is very slow but Index Only Scan is fast
                            
                                How to know affected rows in Cassandra(CQL)?
                            
                                How to hide Caution: Changing any part of an object name could break scripts and stored procedures
                            
                                MySQL to get previous year record
                            
                                Jooq dynamically change db's schema in generated query
                            
                                Two columns condition in different rows in PostgreSQL
                            
                                Add Nested Key Value to JSONB in Postgres
                            
                                Conditional SUM on Oracle
                            
                                What happen in SQL 2005 when it run out of number for an autonumber column?
                            
                                Does SQL Server CheckSum calculate a CRC? If not how can I get MS SQL to calculate a CRC on an arbitrary varchar column?
                            
                                How bad is ignoring Oracle DUP_VAL_ON_INDEX exception?
                            
                                Restoring a Backup to a different Server - User Permissions
                            
                                MySQL MyISAM table locking
                            
                                What goes into rolling your own wiki using c# and sql?
                            
                                Drop group of stored procedures by name
                            
                                SQL Server error: Primary file group is full

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to import huge chunks of data to PostgreSQL?

Tags:

performance

sql

database

postgresql

Attila O.

People also ask

1 Answers

Frank Heikens

Recent Activity

Donate For Us

How to import *huge* chunks of data to PostgreSQL?

Tags:

performance

sql

database

postgresql

Attila O.

People also ask

1 Answers

Frank Heikens

Related questions

Recent Activity

Donate For Us

How to import huge chunks of data to PostgreSQL?